Re: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

From: Jungshik Shin (jshin@mailaps.org)
Date: Mon Dec 08 2003 - 13:57:47 EST

  • Next message: Peter Constable: "RE: Fwd: Re: Transcoding Tamil in the presence of markup"

    On Mon, 8 Dec 2003, Peter Kirk wrote:

    > On 08/12/2003 08:37, Doug Ewell wrote:
    >
    > >Peter Kirk <peterkirk at qaya dot org> wrote:

    > >>I may have missed or misunderstood the details, but it has been
    > >>clearly stated here in the last few days that (a) there are more
    > >>than 11,000 redundant Korean characters in the BMP, and (b) many
    > >>precomposed Korean characters lack canonical or even compatibility
    > >>decompositions which would be desirable.

      You're another 'victim'(?!) of the multi-level representability of the
    Korean script. Although I consistently used syllables, letters (Jamos:
    complex/compund vs simple/basic), it may not have been clear to you.

    Doug, thank you for the clear summary.

    > >Jungshik has been saying for years now that (a) the 11,172 precomposed
    > >syllables are redundant, since they can all be easily decomposed into
    > >jamos.

      Although I wasn't involved in encoding them, I wrote at least once in
    the early 1990's in the public mailing list that all of them would have
    to be encoded. So, I was certainly not free from the shortsightedness
    of Koreans that pushed the proposal to encode them all. (I heard that
    the ballot was passed with a narrow margin.) I just have respect for
    Indians behind ISCII (which was first published in 1988?). To be fair
    to my fellow Koreans, I must add that the need for encoding Hanjas (CJK
    ideographs used in Korea) made things complicated for Korean character
    sets (before Unicode).

    > > He also said recently that (b) the jamos that represent doubled
    > >sounds or "letter clusters" had compatibility equivalences in Unicode
    > >2.0, but these were subsequently removed, and that this too was a
    > >mistake.

      Actually, I have been saying this almost as long :-) I also have
    to add that there's at least one understandable reason for the removal
    (which is escaping me at the moment.)

    > >So there are (a) 11000+ redundant Korean characters, and there are (b)
    > >Korean characters without decompositions. But there are not (a × b)
    > >"11000+ redundant Korean characters without decompositions."

    > Do the 11,172 precomposed syllables actually have canonical or
    > compatibility decompositions? Are they composition exclusions?

      Peter, can you just open up TUS 4.0 section 3.12 (refered to by
    Doug in his first reply on the issue) and you would know. They're
    canonically equivalent and _not_ composition exclusions. If even that's
    not the case, it'd be really disastrous.

      Jungshik



    This archive was generated by hypermail 2.1.5 : Mon Dec 08 2003 - 14:49:24 EST