L2/07-049

Asmus Freytag
February 2, 2007

In response to the following action items

109-A47 Asmus Freytag Add a note to UAX #11 about why EAW does not obey canonical equivalence.    
109-A48 Asmus Freytag Add a note to UAX #14 about why LB does not obey canonical equivalence, and list the exceptions from L2/06-386. L2/06-386 In progress


I submit this document for discussion at the upcoming UTC meeting #110


Background

Document L2/06-386. had noted that several properties do not preserve canonical equivalence. For East_Asian_Width, this is by construction, so to speak, since the mapping tables from different legacy sets go to specific precomposed form of the character, or, in case of singleton decomposition to one or the other. Little is gained by claiming that all decomposition targets are of equal status, because the primary purpose is to document which specific Unicode characters correspond to mappings from both East Asian and standard character sets.

The Line_Break property is affected insofar as it's AI class is populated by design with characters that are of ambiguous width, but which may decompose to AL characters. The ambiguous characters are the ones for which an implementation has to decide whether they linebreak as ideographs or as regular letters/symbols. Again, little is gained by increasing the pool of characters that are AI, as any implementation is free to resolve the AI class in such a manner that the resolved property preserves canonical equivalence. In the unlikely case that this yields unacceptable line break behavior, it may be necessary to tailor the membership of the AL and ID classes, which is also permitted.

Actions taken


Short notices to this effect have been placed into proposed updates for the appropriate UAX.


Further proposed action

There is one exception to the general rule that  inconsistencies in the line break case arise only from ambiguous East Asian Width.
 

[Line_Break=Alphabetic→Break_Before, Script=Greek→Common]

1FFD # Sk U+1FFD ( ´ ) GREEK OXIA U+00B4 ( ´ ) ACUTE ACCENT

It is therefore proposed to change 1FFD from AL to BB to make it match 00B4.
.