Re: Unicode Stability

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Mar 02 2005 - 18:28:44 CST

  • Next message: Mark E. Shoulson: "Re: Unicode Stability"

    Peter Kirk said:

    > But I
    > find it hard to reconcile with the whole concept of a standard which is
    > supposed to specify how text should be represented,

    There is a fine line between the standard specifying how
    text should (= "can") be represented and specifying how
    text should (= "ought to") be spelled.

    The former is the appropriate domain of the standard. It consists
    in explaining how the characters are to be used to represent
    text, and therefore forms the fundamental basis for interoperability
    in use of the standard.

    The latter is outside the competence of the UTC and consists of
    specifications of conventions, spelling rules, orthography, and
    such that properly belong to the relevant users (and standardizers)
    of orthographies and writing systems.

    If I am interested in representing the syllable "sa" in Hangul,
    the Unicode Standard tells me that I "should" represent it as:

    <U+1109, U+1176>

    or as:

    U+C0AC

    or as:

    <U+3145, U+314F>

    or as:

    <U+FFB5, U+FFC2>

    It doesn't tell me which of those alternatives I "should" pick to
    be "correct" in a given context, although I might get some clues
    from knowing which are "compatibility" or "half-width" versions
    of the jamos. Nor does it tell me that using "sa" in a particular
    Korean word is even a correct spelling for that word. Maybe I
    was mistaken and it should be spelled "ssa" instead, for example.

    Fixing things for Biblical Hebrew by adding holam haser for vav,
    qamats qatan, clarifying positional rules for meteg, etc., makes
    it possible for users to make some distinctions they
    need to make in Unicode representation of Biblical texts.
    But it doesn't end up with the Unicode Standard dictating how
    you should (= "must") spell the Biblical texts.

    > as well as with Doug
    > Ewell's definition of stability that "it does not change in a way that
    > causes existing implementations or data to break".

    Properly restated (for which see Asmus' discussion),
    this is the primary reason for the continued existence of the
    UTC now more than 14 years after its founding. The members
    involved have huge implementation commitments to protect and
    are involved in stupendously massive data collections whose
    stability must be guaranteed.

    Every new addition or change to the standard is subjected to
    detailed scrutiny, with most of the concern being essentially
    summed up as, "Will this change break my existing implementations
    or result in corruption or obsolescence or other bad things
    happening to my existing data stores?"

    Such considerations cannot prevent the standard from changing at
    all, since implementation needs also drive the need to add
    new characters and occasionally reinterpret some of the rules
    of the standard. But considerations for stability *do* constitute
    a very, very significant barrier to arbitrary changes in the
    standard now.

    --Ken



    This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 18:29:59 CST