Re: Encoding Personal Use Ideographs (was Re: Level of Unicode support required for various languages)

From: James Kass (
Date: Fri Nov 02 2007 - 21:09:22 CST

  • Next message: "RE: Component Based Han Ideograph Encoding (WAS: Level of Unicode support required for various languages)"

    Philippe Verdy wrote,

    > ZWJ would be even less useful than using IDC in IDS, notably if you want it
    > to not specify the relative order (something completely against the
    > philosophy or Unicode that wants a logical ordering based on semantics (or
    > order of syllables in the composed square).

    The IDS should only be used to describe the visual or relative order.
    They are a mechanism which is intended for, among other things,
    giving the end user a description of the desired character so that
    the end user may visualize as accurately as possible the appearance
    of the desired character. Semantics and gloss are mostly disregarded
    in forming an IDS. The exception to this is that sometimes semantic
    units aren't broken down, even if they could be.

    > ...
    > For this reason, I do think that ZWJ is not very suitable for that work,
    > TUS-IDS are better, but still insufficient... ...

    IDS are better, indeed. They are also insufficient. Please recall
    that the original poster estimates that the IDS would only be
    useful about 80% of the time. Even so, being able to generate
    glyphs for 80% of a large set of unencoded items would be
    very helpful for some of us.

    > For example, if there are some composed characters whose logical order is
    > from bottom to top, instead of top-to-bottom, the IDS will not describe this
    > correctly. If this ever occurs, will there be variants for the vertical
    > composition IDC? If some traits of one component is moved on another
    > relative place or removed, how will you encode it: according to IDS you
    > would break the semantic as the initial non composed ideograph would no
    > longer be there?

    I'm not sure if I'm understanding the question.

    Here's some examples from a listing of IDS for encoded characters. These
    examples are included in "Annex S" as being non-unifiable because of the
    different relative positions of components, even though the components
    and their ordering are identical:


    When the IDS order is top-to-bottom, the appropriate IDC is used (⿱).
    When the IDS order is bottom-to-top, then correct IDCharacters
    exist : ⿶ and, possibly ⿺.

      𠙶 U+20676 ⿶凵了
      𠙷 U+20677 ⿶凵十
      𠙸 U+20678 ⿶凵厶
      𠙹 U+20679 ⿶凵𠀆
      𠙺 U+2067A ⿶凵千
      𠙼 U+2067C ⿶凵口

    But, 峯 # ⿶夆山. "⿶夆山" would not be a valid IDS.

    Best regards,

    James Kass

    This archive was generated by hypermail 2.1.5 : Fri Nov 02 2007 - 21:12:08 CST