Accumulated Feedback on PRI #274

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Tue Apr 1, 2014
Name: Asmus Freytag
Opt Subject:UAX #9 (PRI 274)

Add as implementation note to UAX#9

During line breaking, if a line is broken at the location of a SHY, the text
around the line break may change. A common case is the replacement of the
invisible SHY by a visible HYPHEN, but see Section x.x in the Unicode
Standard.

For the purposes of the Bidi Algorithm, apply steps .. to .. after any
substitutions have been made, using the directional classes for the
substituted characters, instead of a single BN for the SHY character.

[example]

Note, no special action need be taken for a SHY character in the middle of a
line, unless they are rendered as visible glyphs in a "show hidden character"
mode. In the latter case, the recommendation would be to treat the visible
symbol substituted for the SHY as having bidi class ON.

Date/Time: Mon Apr 21 20:36:33 CDT 2014
Name: James Clark
Report Type: Error Report
Opt Subject: Unclear wording in UAX#9, rule W6

Rule W6 in UAX#9 (http://www.unicode.org/reports/tr9/tr9-29.html#W6)

says:

"Otherwise, separators and terminators change to Other Neutral".

It wasn't immediately clear to me whether "separators" here was intended 
type S (Segment Separators).  I suspect it's not because the title of the 
section is "Resolving Weak Types" and type S is neutral rather than weak.

I suggest this should be made explicit.

Date/Time: Tue Apr 22 13:52:26 CDT 2014
Name:Asmus Freytag
Report Type: Error Report
Opt Subject: Obfuscating language in BD16 and N0 of UAX#9

UAX#9 uses an unnecessarily involved algorithmic description for a paired bracket. 
This makes that part of the bidi algorithm difficult to understand and in 
particular authors that are not programmers will not be able to arrive at a 
proper prediction of which brackets will be handled correctly. This affects 
also text assembled programmatically.

Accordingly, the unclear language should be replace as follows
---------

BD16a  A bracket pair is a pair of an opening paired bracket and a closing
   paired bracket characters 
   such that the Bidi_Paired_Bracket property value of the former
   character or its canonical equivalent equals the latter character or
   its canonical equivalent.

BD16b  A resolved bracket pair is a bracket pair that has been
   been selected from among possible bracket pairs in an isolating run
   sequence.

Note: for the PBA this selection is performed according to Rx (below).


Rx  For each isolated run sequence, bracket characters are selected
   into resolved bracket pairs as follows:
   Starting at the beginning of the run sequence, when the a closing 
   bracket character is encountered, find the nearest preceding 
   opening character that forms a bracket pair, but is not already
   part of a resolved bracket pair, and not ignored for bracket pair
   selection. 
	If one exists, resolve the pair, and mark any enclosed 
   opening brackets of any kind as not part of a bracket pair and
   ignored for further bracket pair selection. Otherwise, if no pair 
   can be selected, mark the closing bracket as not part of a pair
   and ignored for further pair selection.
	

Note: the outcome of Rx is a list of resolved pairs and their 
locations. Selected pairs can nest, but can't otherwise overlap.
The rule prefers the closest pair for matching as opposed
to attempting to select for the most hierarchical set of nested
pairs. (See examples).

------------

What I have called Rx here, would become N0a with the part of NO that is
the second bullet numbered N0b.

I would move the existing examples from BD16 into the rules section, not leave them
in the definitions as they are today.

Rule N0 (second bullet) would change from:

For each bracket-pair element in the list of pairs of text positions...

to 

For each resolved bracket pair...

Date/Time: Fri Apr 25 09:56:00 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject:

This is an editing suggestion for UAX#9, it does not intend to change anything in 
the behavior or results of applying the UBA to bidirectional text.

I agree with Asmus Freytag that the definitions in BD16 are too complicated and 
involve an algorithm as part of the definition.  I suggest the following 
alternative text for BD16:

   A bracket pair is a pair of an opening paired bracket and a closing
   paired bracket characters within the same isolating run sequence,
   such that the Bidi_Paired_Bracket property value of the former
   character or its canonical equivalent equals the latter character
   or its canonical equivalent, and provided that a closing bracket is
   matched to the closest match candidate, disregarding any candidates
   that either already have a closer match, or are enclosed in a
   matched pair of other 2 bracket characters.

Date/Time: Fri Apr 25 10:01:07 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Unclear wording in paragraph 3.1.2 of UAX#9

These comments are of purely editorial nature, and do not intend 
to change anything in behavior or results of the UBA.

The UBA has this sentence in paragraph 3.1.2 near its very end:

  As rule X10 will specify, an isolating run sequence is the unit to which 
  the rules following it are applied, and the last character of one level 
  run in the sequence is considered to be immediately followed by the first 
  character of the next level run in the sequence during this phase of 
  the algorithm.

The "rules following it" part is a bad referent.  I suggest to replace 
it with the following unambiguous reference:

  As rule X10 will specify, an isolating run sequence is the unit to 
  which the rules following X10 are applied, ...

Date/Time: Fri Apr 25 10:06:44 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Usage of "current embedding level" in paragraph 3.3.2 of UAX#9

This change suggestion is of a purely editorial nature, and doesn't 
intend to change any behavior.

In rule X6 of the UBA, we have this language:

  X6. For all types besides B, BN, RLE, LRE, RLO, LRO, PDF, RLI, LRI, FSI, and PDI:

   • Set the current character’s embedding level to the embedding level 
   of the last entry on the directional status stack.
   • Whenever the directional override status of the last entry on the directional 
   status stack is not neutral, reset the current character type according to the 
   directional override status of the last entry on the directional status stack.

In other words, if the directional override status of the last entry on the 
directional status stack is neutral, then characters retain their normal types: 
Arabic characters stay AL, Latin characters stay L, spaces stay WS, and so on. If 
the directional override status is right-to-left, then characters become R. If the 
directional override status is left-to-right, then characters become L.

  Note that the current embedding level is not changed by this rule.

Note the last sentence.  Its reference to the "current embedding level" is unclear 
and confusing, more so because the previous text mentions "the current character's 
embedding level", which _is_ changed by this rule.  I believe the intent was to say 
that the embedding level of the last entry on the directional status stack is not 
changed by X6.  If so, I suggest to say that explicitly.

Date/Time: Fri Apr 25 10:10:35 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Unclear language in rule X10 of UAX#9

This comment is of editorial nature and doesn't request any changes in behavior.

Rule X10 of UAX#9 includes this bullet:

  Apply rules W1–W7, N0–N2, and I1–I2, in the order in which they appear below, 
  to each of the isolating run sequences, applying one rule to all the characters 
  in the sequence in the order in which they occur in the sequence before applying 
  another rule to any part of the sequence. The order that one isolating run 
  sequence is treated relative to another does not matter.

This says nothing at all about the order of applying the rules W1-W7, N0-N2, and 
I1-I2 between the different isolates.  I suggest the following more clear rewording:

    Apply rules W1–W7, N0–N2, and I1–I2 to each of the isolating run sequences.
    For each sequence, completely apply each rule in the order in which they appear below.
    The order that one isolating run sequence is treated relative to another does not matter.

Date/Time: Fri Apr 25 10:18:44 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Ambiguous language in rule N0 of UAX#9

This comment is of purely editorial nature, it does not require any changes in behavior.

Rule N0 of the UBA says, among other things:

  For each bracket-pair element in the list of pairs of text positions 
  a.Inspect the bidirectional types of the characters enclosed within the bracket pair.
  b.If any strong type (either L or R) matching the embedding direction is found, set 
  the type for both brackets in the pair to match the embedding direction. 

But there's no explanation what is meant by matching a string type, L or R, to the 
embedding direction.  I think this requires some specific definition to become clear.

Likewise, table 3 in paragraph 3.1.4 talks about "text ordering [...] that matches 
the embedding level direction (even or odd)", but never explains what such a match means.

I suggest to tell in both these places that L matches even embedding direction, 
whereas R matches odd embeddings.

Date/Time: Fri Apr 25 10:22:57 CDT 2014
Name: Eli Zaretskii
Report Type: Error Report
Opt Subject: Obfuscated definition of "isolating run sequence" in UAX#9

This change suggestion is of a purely editorial nature.

UAX#9 defines an "isolating run sequence" in BD13 in a way that is unnecessarily 
complex and hard to understand.  In a nutshell, little is said except an algorithm 
to compute the set of all isolating run sequences for a paragraph.

I suggest the following formal definition of an isolating run sequence to be 
included in BD13:

  An isolating run sequence is the maximal sequence of level runs of
  the same embedding level that can be obtained by removing all the
  characters between an isolate initiator and its matching PDI (or
  paragraph end, if there is no matching PDI) within those level runs.