Unicode 3.1: IDS and ZW(N)J

From: John Cowan (jcowan@reutershealth.com)
Date: Wed Jan 24 2001 - 10:04:40 EST


There are two problems with IDS in Unicode 3.1:

1) The new unified ideographs U+20000 to U+2A6D6 need to be added to the
formal grammar of IDSes. The new compatibility ideographs
U+2F800 to U+2FA1D should be explicitly excluded from IDSes.
This is editorial.

2) TUS 3.0 says (p. 271):

# An implementation may render a valid [IDS] [...] by parsing the
# [IDS] and drawing the ideograph so described. In [that] case, the
# [IDS] should be treated as a ligature of the individual
# characters for purposes of hit testing, cursor movement,
# and other user interface operations.

Therefore, it would be useful to allow ZW(N)J in IDSes in order
to encourage or inhibit this ligaturing behavior. Adding
a rule

        Joiner ::= U+200C | U+200D

and modifying the existing rule for IDS to allow these cases:

        BinaryDescriptionOperator IDS Joiner IDS
        TrinaryDescriptionOperator IDS Joiner IDS IDS
        TrinaryDescriptionOperator IDS IDS Joiner IDS
        TrinaryDescriptionOperator IDS Joiner IDS Joiner IDS

would do the trick.

This means that sequences like <U+2FF0, U+3400, U+200C, U+3401> which
were not IDSes before are now IDSes under the new definition.

-- 
There is / one art             || John Cowan <jcowan@reutershealth.com>
no more / no less              || http://www.reutershealth.com
to do / all things             || http://www.ccil.org/~cowan
with art- / lessness           \\ -- Piet Hein



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT