Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)

From: Andrew West (andrewcwest@gmail.com)
Date: Fri Nov 02 2007 - 04:51:46 CST

Next message: vunzndi@vfemail.net: "Re: Codespace Anxiety Redux (was: Re: Level of Unicode support required ...)"

Previous message: James Kass: "Re: Encoding Personal Use Ideographs"
In reply to: John H. Jenkins: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Next in thread: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Reply: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Reply: James Kass: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 01/11/2007, John H. Jenkins <jenkins@apple.com> wrote:
>
> > If you were going to ask me what the "best" way to represent kanji
> > ligatures such as <U+2FF5 U+9580 U+9F8D> would be under an ideal
> > Unicode model, I would say as <U+9580 U+200D U+9F8D>, using ZWJ to
> > indicate the ligation, and smart fonts would ligate the two components
> > into a single glyph if they could.
>
> Actually, do it without the ZWJ, which would break the IDS syntax.
> Just make the ligature on by default.

To clarify, in my ideal world IDS sequences would not be composable
into a single glyph by fonts, but would always be rendered as a
sequence of IDC and ideographic characters. I would use ZWJ for
hanzi/kanji ligation without any IDC characters. The obvious
disadvantage to this is that it does give the font any clues as to
what the character should look like, but that is true for all scripts
that have ligatures. In the case of simple kanji ligatures the
resultant glyph is usually self-evident, but in any case font
designers would probably have to know which particular kanji ligatures
they wanted to support in the first place.

The beauty of the ZWJ model (or evilness of the model, depending on
your point of view) is that an A-ZWJ-B ligature may look exactly the
same as a B-ZWJ-A ligature but would be treated as distinct entities.
Thus, if someone wanted to create a ligature of U+9F8D 龍 long2
"dragon" U+9580 門 men2 "gate" as cute way of writing Longmen 龍門
"Dragon's Gate", with U+9F8D inside U+9580 they could do so with the
sequence <U+9F8D U+200D U+9580> (representing the logical order of
the ligatured characters). This would render the same as Ben's
<U+9580 U+200D U+9F8D>, but would be treated differently by search
engines, etc.

Incidentally, if Ben does want to find evidence for <U+2FF5 U+9580
U+9F8D> that will satisfy UTC and WG2 then my suggestion is that he
trawls through the corpus of literature relating to the Longmen
Grottoes <http://en.wikipedia.org/wiki/Longmen_Grottoes> and ancient
descriptions of walled cities with gates named Longmen -- I'm sure
that someone sometime somewhere must have already created the
character as a shorthand for <U+9F8D U+9580>. The thing that really
surprises me is that it is not already encoded, when we have
characters such as:

U+49B0 䦰 gate + tortoise/turtle
U+95A9 閩 gate + insect
U+95D6 闖 gate + horse
U+28CEF 𨳯 gate + ox
U+28D2F 𨴯 gate + pig
U+28D58 𨵘 gate + tiger
U+28D5C 𨵜 gate + frog
U+28D85 𨶅 gate + lamb
U+28D87 𨶇 gate + crow
U+28DA0 𨶠 gate + bird
U+28DA2 𨶢 gate + fish
U+28DCD 𨷍 gate + tortoise/turtle
U+28DDF 𨷟 gate + tortoise/turtle
U+28DF7 𨷷 gate + insect
U+28DFA 𨷺 gate + tortoise/turtle

Andrew

Next message: vunzndi@vfemail.net: "Re: Codespace Anxiety Redux (was: Re: Level of Unicode support required ...)"
Previous message: James Kass: "Re: Encoding Personal Use Ideographs"
In reply to: John H. Jenkins: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Next in thread: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Reply: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Reply: James Kass: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Nov 02 2007 - 04:54:32 CST