Re: Level of Unicode support required for various languages

From: vunzndi@vfemail.net
Date: Thu Oct 25 2007 - 22:42:31 CDT

  • Next message: William J Poser: "Re: Level of Unicode support required for various languages"

    Quoting David Starner <prosfilaes@gmail.com>:

    > In 10/25/07, vunzndi@vfemail.net <vunzndi@vfemail.net> wrote:
    >> I aaware that the original aim of unicode was to have all 'useful'
    >> characters in the BMP. However as far as CJKV characters are concerned
    >> this has not been done, rather characters have been added on a first
    >> come first serve basis.
    >
    > The character set standards of China, Taiwan, Japan and Korea were
    > completely included in the BMP. The sets of characters that computer
    > users of CJKV characters were actually using are all in the BMP. That
    > was not a first come, first serve policy. Unicode continued to add the
    > characters that the standards bodies of those nations thought were
    > important to the BMP for several years. It was not a first come, first
    > serve basis.
    >
    >> If the allocation of CJKV codepoints continues
    >> to be donr in this way, then for modern CJKV coverage will require not
    >> only BMP and plane 1 support but also, in the future, plane 3 suport.
    >
    > (Should be plane 2, BTW.)
    >
    > If it continues to be done in what way? They currently have teams of
    > experts sorting through the body of writing in Han ideographs, finding
    > new distinct ideographs, and identifying what most needs encoding.
    > Short of God handing the next set of Han ideographs down from Mt.
    > Sinai on stone tablets, I don't know what improvements can be made.
    >
    >

    First come cirst serve, in that characters are encoded in the order
    they a submitted to the IRG. Therefore a block or extension includes a
    mixture of modern and non-modern characters. The order characters are
    processed in by the IRG is based on when they are submitted not based
    upon whether the usage of the character is ancient or modern.
    Extensions C and D are the characters submitted to IRG on or before
    2002, characters submitted from 2002 to the present will be this
    system be put in Extension E, unless there is a change of IRG policy,
    and any charcters submitted to the IRG in the future, if the same
    approach is followed will be Extensions F, G, ...

    Now of course some of the first set suggested are in wide used, but
    not all of them, even in extension A there exist characters like
    U+416B, that one could safely bet one million pounds against anyone
    being able to say it's meaning or pronunciation.

    To the best of my knowledge at no point in time did people discuss
    what cjkv characters are in modern use and leaving space in the BMP
    for them, or by the time anyone thought about this the BMP had already
    been filled up with too much CJKV clutter.

    regards
    John

    -------------------------------------------------
    This message sent through Virus Free Email
    http://www.vfemail.net



    This archive was generated by hypermail 2.1.5 : Thu Oct 25 2007 - 22:44:21 CDT