Re: Level of Unicode support required for various languages

From: Andrew West (andrewcwest@gmail.com)
Date: Thu Oct 25 2007 - 12:13:49 CDT

  • Next message: Eric Muller: "Re: Level of Unicode support required for various languages"

    On 25/10/2007, Peter Constable <petercon@microsoft.com> wrote:
    >
    > I wonder if you could elaborate. We hear that CJK users typically use well under 10K characters, and for years there have been implementations using character sets that didn't include any of the Plane 2 characters and that, evidently, were adequate for lots of usage. So, it's not obvious that Plane 2 characters would be needed in all application scenarios. (Of course, Tim hasn't really said much about his application scenario.) I do note that the II Core set includes 22 Plane 2 characters; are these the characters you had in mind? In what scenarios is it important to support them?

    62 according to <http://www.cse.cuhk.edu.hk/~irg/irg/IICore/IICore.htm>

    Offhand I don't really recognise any of them, but maybe they are
    mostly used in the barbarian southern dialect.

    On the other hand, there are a few characters not in the IICORE set
    but which are commonly used in colloquial Mandarin that are in the
    SIP, such as U+24B62 𤭢 cei4 "to break".

    And CJK-C has 26 characters sourced to Xiandai Hanyu Cidian, the
    standard PRC short dictionary of modern Chinese, including several
    immediately recognisable characters such as the simplified form of
    U+5D19 (the lun2 in kun1lun2 崑崙).

    In my opinion it is no good trying to seek an easy way to support
    Unicode without the hassle of combining characters, variation
    selectors, contextual glyph variants, surrogate pairs, etc.. If for
    some reason you cannot use existing implementations (and it is
    difficult to imagine a scenario where you can't do so) then you have
    to implement a proper generic solution that will work with everything.

    Andrew



    This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 12:17:04 CDT