RE: Compression through normalization

From: Jungshik Shin (jshin@mailaps.org)
Date: Wed Dec 03 2003 - 11:35:48 EST

  • Next message: D. Starner: "Re: MS Windows and Unicode 4.0 ?"

    On Wed, 3 Dec 2003, Philippe Verdy wrote:

    > Jungshik Shin writes:
    > > > > I already answered about it: I had mixed the letters TLV instead of
    > > > > LVT. All the above was correct if you swap the letters. So what I did
    > > > > really was to compose only VT but not LV nor LVT:
    > > > >
    > > > > ( ((L* V* VT T*) - (L* V+ T)) | X )*
    > > > >
    > > > > I did it by using a leading filler (U+110B) to represent VT as an LVT
    > > > > syllable...
    > > >
    > > > But U+110B isn't a filler, it's a real letter, IEUNG. If you want a
    > > > choseong filler, you have to use U+115F. IEUNG is not equivalent to a
    > > > filler and can't be used to construct a so-called "VT syllable." For
    > > > example, (U+1100 + U+C544) is not equal to U+AC00.
    > >
    > > Doug is right. Philippe appears to have been confused by the fact that
    > > phonetically U+110B IEUNG is 'null-consonant' (the place holder
    > > for syllables
    > > that begin with a vowel). In Unicode-sense, however, U+110B is not
    > > a filler but as genuine a letter as any other leading consonants are.
    >
    > Oops. I should have read that part better. So my test was giving wrong
    ...
    > I do need to reread chapter 11.4... Which allows composing 19 leading
    > consonnant jamos, 21 medial vowels jamos (399 johab syllables), and
    > optionally 27 trailing consonnants jamos (10773 johab syllables). Plus
    > section 3.12 for conforming conjoining behavior of jamos.

       Note that Korean syllables in Unicode are NOT "LVT?" as you seem to think
    BUT "L+V+T*" with '+', '*' and '?' have usual RE meaning.

    > rechecking it, you're right that this is not U+110B but U+115F. I wonder if
    > there's a way to use it to encode a VT syllable separately from the leading
    > consonnant jamo that normally starts all modern Korean. I fear not, because
    > johab syllables can only start by a choseong in U+1100 to U+1112.

      Who said that? 11,172 precomposed syllables are both *redundant*
    (should have never been encoded) and *incomplete* even for modern Korean
    text. I prefer to use Korean letters (in U+1100 block) for every single
    syllables of Korean, modern or not. We do need U+115F followed by 'V+T*'
    in modern Korean text in dictionaries, grammar books and lingustics text.

    > attempt, and I did not see that I was in fact breaking the text by adding a
    > visible IEUNG. (It "may" be phonetically acceptable only if the vowel
    > encoded in the syllable is YE or YO or YU, but I'm not sure about it, and

       ???

    > So until there are new VT "syllables" (this would require 21*27=567 code
    > points, but one cannot locate them after the existing hangul syllables now
    > after U+D7A3, because it would require a free area U+D7A4..U+D9D9 which is
    > used partly for high surrogates starting at U+D800) encoded with excluded

      Come on!!! We do not want to encode any more precomposed syllables.
    Encoding 11,172 of them already ranks top in the list of things we'd
    have done differently. Adding 567 more would NEVER NEVER happen even if
    there's room for them.

    > canonical decompositions for stability of decompositions, I fear that it's
    > impossible.

      I didn't receive your email in which you appear to have pondered over
    a compression scheme so that I have no idea what's impossible here.

    > Now I wonder what is the exact role of the choseong filler U+115F in the
    > Hangul script except for allowing (not composable) VT syllables for foreign
    > or old words (starting by a vowel)

      Foreign/old syllables don't begin with U+115F. If they begin with
    vowels, U+110B IEUNG ('null consonant') should be used in the leading
    consonant slot.

    > and that can only be written with
    > separate jamos without forming a ligature with a possible previous leading
    > consonnant (terminating another word)...

      See the above.

      Jungshik



    This archive was generated by hypermail 2.1.5 : Wed Dec 03 2003 - 12:37:27 EST