Re: [unicode] CJK variation modifier

From: mpsuzuki@hiroshima-u.ac.jp
Date: Mon May 21 2007 - 04:50:40 CDT

  • Next message: Philippe Verdy: "RE: [unicode] CJK variation modifier"

    Dear Gerrit,

    On Sat, 19 May 2007 13:54:30 +0200
    Gerrit Sangel <z0idberg@gmx.de> wrote:
    >I just read some papers about the han unification
    >but am a bit confused if there is something like
    >a modifier control character.
    >
    >As far as I know, even though some radicals/characters
    >are different in Chinese and Japanese, they were unified.
    >
    >But using a separate font in case the character is now
    >Chinese, Japanese or Korean is not always possible
    >(think of file names, mp3 tags, plain text files and so on),
    so I wondered if there is something like a control
    >character in proposal?

    I guess what you want had ever been proposed as
    "language tagging".
            http://unicode.org/faq/languagetagging.html
            http://www.unicode.org/reports/tr7/tr7-4.html
    It was obsoleted, because the language specification in
    plain Unicode text will conflict with higher level
    language specifications in XML, HTML etc. ISO-2022
    encoding may be better solution for such requiement.

    >For example, using U+8336 茶 (I think, this character has
    >different variants in Chinese and Japanese) and then
    >append a control character to let the display program
    >decide whether it should use a Chinese or Japanese glyph.

    It was popular aspect that the specification of language
    is sufficient to select appropriate glyph shape, when
    glyph collection we focused was CJKV 5 column list in
    ISO 10646 specification. But it was incorrect, I think.

    >It seems, there is also a Variation Database
    >http://www.unicode.org/reports/tr37/
    >accepted, but as far as I understood it, it is not really
    >a clearly defined way, e.g. that variation 2 of character x
    >has always the same specific appearance.

    Some people expects UTS37 as a database of unique glyph,
    but it is not such. IVS is a registry of VS for ideographs
    to avoid VS conflicts in interchange.
    Before UTS37, system A can use U+xxxx E0100 for glyph A,
    system B can use U+xxxx E0100 for glyph B. This is conflict.
    After UTS37, system A can use U+xxxx E0100 for glyph A,
    system B cannot use U+xxxx E0100 for glyph B.
    Although IVD does not mention about the shape of glyph A,
    the conflict was blocked.

    >If there were a way to store the information about
    >the variation of the character in the text itself,
    >I think, it would be possible to create a font
    >to include all CJK characters?

    To include all CJK characters including glyphs for each
    language, the number of glyph will be greater than 64k
    (the size of CJK Unified Ideographs (inc. all Extensions)
     is almost about 64k - if we collect non-unified variants
    for CJKV, the number must be greater than a few times of
    64k). They cannot be packed into single TrueType/OpenType
    font which has limitation of 64k glyphs. You will have
    to implement new font format of larger character collection
    and rasterizers, text render etc etc. I guess it is not
    what you want.

    Regards,
    mpsuzuki



    This archive was generated by hypermail 2.1.5 : Mon May 21 2007 - 04:55:17 CDT