Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)

From: Andrew West (
Date: Thu Nov 01 2007 - 08:41:45 CST

  • Next message: Murray Sargent: "RE: Stix beta fonts released"

    On 01/11/2007, <> wrote:
    > Most characters are like your name,

    On the other hand most ideographic characters are not at all like the
    character for Ben's name, which is a not a single abstract character,
    but is really just a ligature
    <> of its two
    components, U+9580 門 "mon" plus U+9F8D 龍 "rou". It has no meaning in
    itself other than being a phonetic representation of "Monroe".

    Such ligatures are very rare in Chinese (the typical example is U+74E9
    瓩 qian1wa3 "kilowatt"), but more common in Japanese -- the name of
    Kitagawa Utamaro 喜多川 歌麿 comes to mind, where U+9EBF 麿 maro is a
    ligature of the characters U+9EBB 麻 and U+5415 吕 (I am probably wrong
    on this, but from a quick google it seems that he may have been the
    first person to join the two characters into one, and in earlier times
    the two components of the character were written separately).

    If you were going to ask me what the "best" way to represent kanji
    ligatures such as <U+2FF5 U+9580 U+9F8D> would be under an ideal
    Unicode model, I would say as <U+9580 U+200D U+9F8D>, using ZWJ to
    indicate the ligation, and smart fonts would ligate the two components
    into a single glyph if they could. But back in the real world, the
    approach taken has been to encode kanji ligatures (and kana ligatures,
    and even kana/kanji hybrid ligatures) as separate characters, so Ben's
    character is a potential candidate for encoding.


    This archive was generated by hypermail 2.1.5 : Thu Nov 01 2007 - 08:44:09 CST