Re: New Public Review Issue: UAX #24 Script Names

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jan 20 2006 - 14:39:37 CST

  • Next message: Guy Steele: "Re: Ellipsis"

    From: "Rick McGowan" <rick@unicode.org>

    > Review period for the new item closes on January 30, 2006.
    >
    > Proposed Update to UAX #24: Script Names
    > http://www.unicode.org/reports/tr24/tr24-8.html
    >
    > This proposed update contains a proposed change in default script value
    > for unassigned characters from Common to a new value Unknown, and a
    > correction for the contents of the Script=Inherited value.

    I see nearly no impact in this change (except for some regular expressions matching non-standard characters and treating them along Common characters).

    But anyway, any string containing non-standard character is handled unpredictably one these characters are assigned and given a script property other than Common.

    So new strings that would go across algorithms based on old versions of the UCD are already affected, and this difference won't change after the proposed update when currently unassigned characters will be assigned later and moved again from the "Unknown" script to some other script.

    However I see a significant change, if a process currently expects that any characters matching the regular expression "[^[:Common:]]" are assigned and have stable normalization and stable normative properties. With the change, it will be necessary to exclude also [:Unknown:] from the character range above.

    Isn't there another existing standard character property or regular expression that matches unassigned characters without using the new Script property value "Unknown", so that regular expressions still continue to exclude unassigned characters independantly of the version of the UCD ?



    This archive was generated by hypermail 2.1.5 : Fri Jan 20 2006 - 14:41:38 CST