L2/04-079

Addendum to Comments on Public Review Issues

The sections below contain comments received on the open Public Review Issues between January 28 and February 5, 2004, after posting of document L2/04-026.

20 Draft UTR #31 Identifier and Pattern Syntax

Date/Time: Mon Feb 2 16:46:54 EST 2004
Contact: Martin Duerst

This is a comment on behalf of the W3C I18N WG.

With respect to Review issue 20, Draft UTR #31 Identifier and Pattern Syntax, we think that making non-characters default-ignorable is a bad idea.

28 BIDI Boundary_Neutral Property Value

Date/Time: Mon Feb 2 15:46:08 EST 2004
Contact: Tim Partridge

Some of the Default Ignorable characters are in that category because they should not be present in the data stream, or because their properties are not explicitly defined in the version of Unicode understood by the implementation, but they are in the range reserved for format characters. I have no objection to these being made boundary neutral (BN).

Other Default Ignorable characters have defined uses in Unicode, but if an implementation does not implement them, may be safely ignored. While it may seem tempting to make these characters BN as well, I am concerned that implementations that *do* implement the characters will encounter difficulties as BN characters do not go to defined positions in the output from the bidi algorithm. (Indeed officially they can be deleted.)

In particular the U+180B..U+180D MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE, U+FE00..U+FE0F VARIATION SELECTOR-1..VARIATION SELECTOR-16 and U+E0100..U+E01EF VARIATION SELECTOR-17..VARIATION SELECTOR-256 are all supposed to immediately follow the character they influence. Currently this is achieved by giving them a bidi class of NSM which effectively keeps them adjacent to the preceding character. Changing their class to BN would potentially allow them to become separated. Although this could be overcome by refering back to the original data (like the joiners), it seems more trouble than it is worth compared to the existing working solution.

It also seems desirable to keep U+00AD SOFT HYPHEN next to where it started as it is used to determine line break positions. The Hangul filler characters are also supposed to stay in sequence to form syllables when rendering.

U+034F COMBINING GRAPHEME JOINER can be made BN as it should not affect rendering (at least in modern implememtations).

I am not sure that there is any need to define a bidi property for the non-characters. Any use of the non-characters would be for private implementation purposes and the implementation would decide its own bidi property for each character. Any public exchange of these characters cannot be expected to produce any defined results, and most implementations would either filter out the characters to prevent internal chaos, or would not produce a visible rendering.