Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Jun 27 2003 - 11:35:03 EDT

  • Next message: Karljürgen Feuerherm: "Re: Biblical Hebrew (Was: Major Defect in Combining Classes of Tibetan Vowels)"

    On Friday, June 27, 2003 4:40 PM, John Cowan <jcowan@reutershealth.com> wrote:
    > Not so. Sometimes stability is more important than correctness.

    Very well answered. I don't see why we need to sacrifice stability when
    correcting something. As the error is not in ISO10646, it is definitely not
    reasonnable to have ISO10646 endorse the error done by Unicode due
    to its stability pact.

    For now, the only good solution is to use existing Unicode-only resources
    that will not impact the normalization pact, and the ISO10646 unification
    work. If this requires defining some additional Unicode semantics or
    properties for some language-significant markup characters, this can be
    done with variants (if ISO10646 accept it), or with a request for a
    dedicated new *invisible* diacritic in the Hebrew block to ISO10646.

    May be Unicode should be more prudent with Normalization Forms: if
    new characters are added, their combining classes should be
    documented as informative before there is a consensus and
    experimentation. This will not break the stability pact with XML, which
    will simply not accept the new characters before they are stabilized
    by Unicode.

    So the characters can be standardized by Unicode, and ISO10646, but
    be used with caution with XML which can restrict the set of characters
    supported to only those for which the canonicalization is not finished.

    Why not then documenting these critical normative properties to make
    them clearly informative if needed?
    For example informative canonical decompositions could be noted with
    <canon> (and thus only recognized by compatibility decompositions
    until further notice).

    And proposed combining classes could be noted with an additional
    symbol in the CC column of the UCD (for example a "?").

    This would prevent using the character within XML compliant
    applications, but it could allow a more rapid development of fonts
    and renderers or layout engines, allow experimentations to encode
    actual new documents with some safe-guards regarding the
    actual character properties.

    This would say to IETF and W3C a "warning" this character has
    an informative combining class or decomposition. Normalization
    at this step is dangerous, and documents should be considered
    as already normalized for those characters.

    These potentially instable unicode-encoded documents will then
    be labelled with the unicode version, as a future revision may
    require verigying if the informative properties have become
    enforcable. If there's a change in the properties, existing
    documents can then be tested to see if they still respect the
    proposed normalization, and corrected. If there is no change
    after say 1 year, a revision annex publishes these properties
    as normative and a incremental version of Unicode is added,
    that allows interchange and conservation of the encoded
    documents without an explicit Unicode version label.



    This archive was generated by hypermail 2.1.5 : Fri Jun 27 2003 - 12:25:57 EDT