Re: Level of Unicode support required for various languages

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Oct 30 2007 - 18:25:47 CST

  • Next message: Mark E. Shoulson: "Re: Level of Unicode support required for various languages"

    James Kass said:

    > >Yes, but remember that you have two PUA planes to use, planes 15 and
    > >16. Unless you're anticipating more than 100K total, I think you'll
    > >be OK.
    >
    > While I agree that the PUA might be easier to display initially,
    > it should be noted that the advantages of IDSequences include
    > the idea that they are *standard*

    A crucial splitting of hairs is required here.

    The IDC's (Ideographic Description Characters, U+2FF0..U+2FFB)
    are standardized. And the unified ideographs and radical
    symbol characters that can be used with them are also
    standardized.

    The IDS's (Ideographic Description Sequences) are decidely
    *NOT* standardized.

    Which is part of the main point John Jenkins has been making.
    All an IDS tells you is (roughly) what the intended appearance
    is of some Han ideographic shape for a character. It tells
    you nothing about the *identity* of that character, nor does
    it tell you whether somebody else's related IDS is or is not
    the "same" character.

    *Instances* of IDSs have no status whatsoever in the standard.
    All that has status is the *concept* of an IDS (and the
    syntax for expressing them).

    In terms of information content, an IDS is one step up
    from a PUA character. For a PUA character, in the absence
    of a detailed mutual agreement, you know nothing about
    the character other than it is a character. For an
    IDS, you know the intended approximate shape of the
    character and that the intent is to describe a Han
    ideographic character (as opposed to an Ethiopic letter
    or an unencoded Vai syllable). But in the absence of a
    detailed mutual agreement, you can't even know if the
    IDS is describing an unencoded character or an *encoded*
    character, nor if it is encoded, which one.

    --Ken

    > and that input of sequences
    > using standard characters is already supported.
    >
    > Best regards,
    >
    > James Kass
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Tue Oct 30 2007 - 18:27:29 CST