Re: [cowan: Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)]

From: Philippe Verdy (
Date: Fri Jun 27 2003 - 08:34:34 EDT

  • Next message: Jony Rosenne: "RE: Plain-text search algorithms: normalization, decomposition, case mapping, word breaks"

    On Friday, June 27, 2003 1:29 PM, John Cowan <> wrote:
    > Michael Everson scripsit:
    > Change the character classes in Unicode 4.1, and they *might* decide
    > to freeze support at, say, Unicode 3.0.

    Or they may simply opt to define their *OWN* normalization standard, distinct from Unicode NF* form, and designated in a separate reference document, removing *all* references to UAX#15 from XML and IDNA references, only to guarantee this stability that Unicode would be unable to offer.

    Let's not this happen!

    The IDNA protocol authors already made a lot of concessions to Unicode, but they may simply abandon the intent to support the idea of Unicode to normalize old scripts that they clearly don't need. This would mean that modern scripts that are still not encoded would not fit before long within XML or IDNA frameworks...

    And this would be dramatic for those languages (and very frustating for their writers, that have little resources and could not influence the maintainers of other protocol specifications at the same time as Unicode) that are active but would be excluded for use in modern technologies such as XML and IDNA.

    If the supporters of these languages finally consider it is more important to get it usable in modern technologies (notably for XML), they will prefer collaborating with the W3C and ISO10646 and will ignore completely Unicode's attempt to define "abusive" character properties. Unicode will then have no voice for the standardization of those languages, and will have to endorse the character repertoire registered at ISO10646 without any discussion, even if the XML usage contradicts Unicode "normative" rules.

    There's no other choice than maintaining the stability. If this means using special characters for combining sequences, that's something that Unicode will have to do and document clearly...

    -- Philippe.

    This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 09:13:33 EDT