Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)

From: James Kass (thunder-bird@earthlink.net)
Date: Fri Nov 02 2007 - 21:09:22 CST

Next message: vunzndi@vfemail.net: "RE: Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"

Previous message: mpsuzuki@hiroshima-u.ac.jp: "RE: Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"
Maybe in reply to: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Next in thread: Andrew West: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy wrote,

> ZWJ would be even less useful than using IDC in IDS, notably if you want it
> to not specify the relative order (something completely against the
> philosophy or Unicode that wants a logical ordering based on semantics (or
> order of syllables in the composed square).

The IDS should only be used to describe the visual or relative order.
They are a mechanism which is intended for, among other things,
giving the end user a description of the desired character so that
the end user may visualize as accurately as possible the appearance
of the desired character. Semantics and gloss are mostly disregarded
in forming an IDS. The exception to this is that sometimes semantic
units aren't broken down, even if they could be.

> ...
>
> For this reason, I do think that ZWJ is not very suitable for that work,
> TUS-IDS are better, but still insufficient... ...

IDS are better, indeed. They are also insufficient. Please recall
that the original poster estimates that the IDS would only be
useful about 80% of the time. Even so, being able to generate
glyphs for 80% of a large set of unencoded items would be
very helpful for some of us.

> For example, if there are some composed characters whose logical order is
> from bottom to top, instead of top-to-bottom, the IDS will not describe this
> correctly. If this ever occurs, will there be variants for the vertical
> composition IDC? If some traits of one component is moved on another
> relative place or removed, how will you encode it: according to IDS you
> would break the semantic as the initial non composed ideograph would no
> longer be there?

I'm not sure if I'm understanding the question.

Here's some examples from a listing of IDS for encoded characters. These
examples are included in "Annex S" as being non-unifiable because of the
different relative positions of components, even though the components
and their ordering are identical:

U+5CEF,峯,⿱山夆
U+5CF0,峰,⿰山夆

When the IDS order is top-to-bottom, the appropriate IDC is used (⿱).
When the IDS order is bottom-to-top, then correct IDCharacters
exist : ⿶ and, possibly ⿺.

  𠙶 U+20676 ⿶凵了
  𠙷 U+20677 ⿶凵十
  𠙸 U+20678 ⿶凵厶
  𠙹 U+20679 ⿶凵𠀆
  𠙺 U+2067A ⿶凵千
  𠙼 U+2067C ⿶凵口

But, 峯 # ⿶夆山. "⿶夆山" would not be a valid IDS.

Best regards,

James Kass

Next message: vunzndi@vfemail.net: "RE: Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"
Previous message: mpsuzuki@hiroshima-u.ac.jp: "RE: Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"
Maybe in reply to: vunzndi@vfemail.net: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Next in thread: Andrew West: "Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Nov 02 2007 - 21:12:08 CST