Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Aug 06 2003 - 10:26:42 EDT

  • Next message: John McConnell: "RE: Does Unicode 3.1 take care of all characters of 'Hong Kong Supplimentary Character Set - 2001' (HKSCS-2001) ?"

    On Wednesday, August 06, 2003 12:38 PM, Kent Karlsson <kentk@cs.chalmers.se> wrote:
    > Since I think <a, ring above, cgj, dot below> should be canonically
    > equivalent to <a, dot below, cgj, ring above>, but cannot be made
    > so (now), the only ways out seem to be to either formally deprecate
    > CGJ, or at least confine it to very specific uses. Other occurrences
    > would not be ill-formed or illegal, but would then be non-conforming.

    There's a way to specify that <A, RingAbove, CGJ, DotBelow> is
    well-formed, but not <A, DotBelow, CGJ, RingAbove>:
    a CGJ can be authorized in a combining sequence only if it
    precedes a base character, or is precedes a combining character
    which combining class is strictly lower than the combining class
    of the previous character.

    So, with this definition, with the combining classes indicated:

    - <A=0, RingAbove=230, CGJ=0, DotBelow=220>
      is well-formed because 220 < 230. It is distinct from:
      <A=0, RingAbove=230, DotBelow=220>, whose canonical
      ordering is
      <A=0, DotBelow=220, RingAbove=230>

    - <A=0, DotBelow=220, CGJ=0, RingAbove=230>
      is ill-formed because 230 > 220. The CGJ is superfluous
      and should be removed to create:
      <A=0, DotBelow=220, RingAbove=230>

    - <A=0, DotBelow=220, CGJ=0, Cedilla=220>
      is ill-formed because 220 = 220. The CGJ is superfluous
      and should be removed to create:
      <A=0, DotBelow=220, Cedilla=220>
      which is well-formed and in canonical order.

    - <A=0, Cedilla=220, CGJ=0, DotBelow=220>
      is ill-formed because 220 = 220. The CGJ is superfluous
      and should be removed to create:
      <A=0, Cedilla=220, DotBelow=220>
      which is well-formed and in canonical order.

    This "well-formed" rule would clearly give an exact semantic
    for CGJ, used in the middle of a combining sequence as the
    only way to bypass the canonical reordering of combining
    characters.



    This archive was generated by hypermail 2.1.5 : Wed Aug 06 2003 - 11:08:47 EDT