RE: Level of Unicode support required for various languages

From: vunzndi@vfemail.net
Date: Thu Oct 25 2007 - 20:34:50 CDT

  • Next message: mpsuzuki@hiroshima-u.ac.jp: "Re: [unicode] RE: Level of Unicode support required for various languages"

    Dear Peter,

    the exact set of plane 2 characters of course depends on the context
    one is talking about, however appliacations need to be able to suport
    planes 1-16. The most obvious set are the Cantonese characters found
    in plane 2. However various books and even newspapers often require
    characters in plane 2.

    I aaware that the original aim of unicode was to have all 'useful'
    characters in the BMP. However as far as CJKV characters are concerned
    this has not been done, rather characters have been added on a first
    come first serve basis. If the allocation of CJKV codepoints continues
    to be donr in this way, then for modern CJKV coverage will require not
    only BMP and plane 1 support but also, in the future, plane 3 suport.

    Plane 2 includes various Cantonese characters, and as yet unencoded
    include a large number of place names, any already submitted to the
    IRG should end up in plane 2, however any submitted in the future
    could well be in plane 3. Not to mention characters used by 'small'
    communities such as the Zhuang with a population over 10 million.

    There are two slightly different questions here:-

        (1) What characters a font should include:-

    If one in a font has a limited number of cjk glyphs that can be used,
    in this case one chooses the most useful characters (ttf files limit
    to 65536 glyphs). On even simple one has to decide what order to make
    cjk glyphs in. One example making useful characters first is
    uming.ttf, which includes quite a number of plane 2 characters, but
    not full Extension A support.

    In pratice modern dictionaries, designed for high school/college level
    students tend to include about 20 to 25 thousand characters, however
    different regions use some different characters, so one could argue
    over 30 thousand chracters are required as a minimum.

         (2) What a features should an application support.

    IMHO applications need to support surrogates in this day and age. For
    example, for one project I used perl Tk however I discovered too late
    perl Tk does not support surrogates. A difference in this case between
    being an application that is widely used and a dead end. I would there
    urge all developers to include surrogate support in the core features
    of their applications.

    What other modern languages apart for cjkv require sopport beyond the BMP?

    Yours sincerely
    John Knightley

    Quoting Peter Constable <petercon@microsoft.com>:

    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    > On Behalf Of vunzndi@vfemail.net
    >
    >> You certianly support for plane 2 characters, some really obsurce
    >> Chinese characters are in the BMP, but some very useful ones are in
    >> plane 2.
    >
    > I wonder if you could elaborate. We hear that CJK users typically
    > use well under 10K characters, and for years there have been
    > implementations using character sets that didn't include any of the
    > Plane 2 characters and that, evidently, were adequate for lots of
    > usage. So, it's not obvious that Plane 2 characters would be needed
    > in all application scenarios. (Of course, Tim hasn't really said
    > much about his application scenario.) I do note that the II Core set
    > includes 22 Plane 2 characters; are these the characters you had in
    > mind? In what scenarios is it important to support them?
    >
    >
    >
    > Peter
    >
    >
    >
    >

    -------------------------------------------------
    This message sent through Virus Free Email
    http://www.vfemail.net



    This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 20:38:14 CDT