RE: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 02 2007 - 20:03:55 CST

Next message: Bala: "RE: Tamil Sri / Shri"

Previous message: Mahesh T. Pai: "Re: Tamil Sri / Shri"
In reply to: James Kass: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Next in thread: James Kass: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Reply: James Kass: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

James Kass wrote:
> Andrew West wrote,
>
> > The beauty of the ZWJ model (or evilness of the model, depending on
> > your point of view) is that an A-ZWJ-B ligature may look exactly the
> > same as a B-ZWJ-A ligature but would be treated as distinct entities.
> > Thus, if someone wanted to create a ligature of U+9F8D 龍 long2
> > "dragon" U+9580 門 men2 "gate" as cute way of writing Longmen 龍門
> > "Dragon's Gate", with U+9F8D inside U+9580 they could do so with the
> > sequence <U+9F8D U+200D U+9580> (representing the logical order of
> > the ligatured characters). This would render the same as Ben's
> > <U+9580 U+200D U+9F8D>, but would be treated differently by search
> > engines, etc.
>
> Are you sure they would both render the same?

Certainly, specifying a ligature with ZWJ will not be sufficient, as it does
not indicate the type of "ligature" performed: enclosure of the second
ideograph within the first one, or superposition of smaller sizes, or
juxtaposition of narrowed ideographs within the same square.

ZWJ would be even less useful than using IDC in IDS, notably if you want it
to not specify the relative order (something completely against the
philosophy or Unicode that wants a logical ordering based on semantics (or
order of syllables in the composed square).

And even in that case, the IDS encoding order is not necessarily the logical
order, or the components have been changed from their original semantic by
possibly transforming one component by another simpler one (quite frequent
in simplified Chinese and many modern compositions for multisyllabic
ideographs based on ideographs used and interpreted for their syllabic
value, such as composite ideographs created after transliterations or
personal names).

In some compositions, the layout is not necessarily logical (does not follow
the default ordering implied by the IDS syntax) but is rearranged for
practical or typographical reasons, or for readability (this has also
occured in some old Hangul compositions as well, before some new letters
were created; similar reasons explain variations in the placement of some
diacritics in alphabetic scripts as well, including Latin and Greek, or in
some abjads like Hebrew...)

For this reason, I do think that ZWJ is not very suitable for that work,
TUS-IDS are better, but still insufficient... It may be a reason why PRC
insists on encoding ideographs without trying to decompose them, to make
sure that the semantic is preserved or non-ambiguous for the common words or
syllables. However there still remains a problem with newly created
ideographs that are polysyllabic in nature: they are real ligatures, but
their layout is not always logical and there's a conflict between the IDS
syntax that just describes the basic layout in a fixed reading and encoding
order, and the semantic logical order:

For example, if there are some composed characters whose logical order is
from bottom to top, instead of top-to-bottom, the IDS will not describe this
correctly. If this ever occurs, will there be variants for the vertical
composition IDC? If some traits of one component is moved on another
relative place or removed, how will you encode it: according to IDS you
would break the semantic as the initial non composed ideograph would no
longer be there?

Next message: Bala: "RE: Tamil Sri / Shri"
Previous message: Mahesh T. Pai: "Re: Tamil Sri / Shri"
In reply to: James Kass: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Next in thread: James Kass: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Reply: James Kass: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Nov 03 2007 - 10:36:25 CST