RE: Compression through normalization

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 03 2003 - 12:55:53 EST

  • Next message: Edward H. Trager: "Re: Free Fonts"

    > De : Jungshik Shin [mailto:jshin@mailaps.org]
    > Note that Korean syllables in Unicode are NOT "LVT?" as you
    > seem to think
    I did not say that...

    > BUT "L+V+T*" with '+', '*' and '?' have usual RE meaning.

    I said this:
     ( ((L* V* VT T*) - (L* V+ T)) | X )*

    > Who said that? 11,172 precomposed syllables are both *redundant*
    > (should have never been encoded) and *incomplete* even for modern Korean
    > text. I prefer to use Korean letters (in U+1100 block) for every single
    > syllables of Korean, modern or not. We do need U+115F followed by 'V+T*'
    > in modern Korean text in dictionaries, grammar books and lingustics text.

    OK this choseong filler makes sense for vowel starting syllables, to make
    them appear as if it was a L+V+T form. I still doubt that this is really
    needed (unless the intent is to detach the vowel from a possible previous
    trailing consonnant in <L0,V0,T0>, and not form a ligature with it where
    <L0,V0,T0,V1,T1> would be composed as <L0+V0>,<T0+V1+T1> where T0 is
    converted to a leading consonnant.

    > Come on!!! We do not want to encode any more precomposed syllables.
    > Encoding 11,172 of them already ranks top in the list of things we'd
    > have done differently. Adding 567 more would NEVER NEVER happen even if
    > there's room for them.

    What about the existing "compatibility Hangul syllables" starting with
    vowels ? Are they really distinct from the jamos that compose them, as
    if they were decomposed to a leading choseong filler, a vowel and a
    consonnant ? What would happen if a compressor chose to compress
    occurences of <LF,V,T> to these compatibility vowel-starting syllables
    by using a mapping to an internal charset, and reversed the compression
    back to separate Lf, V, T in Unicode?

    I've just read the interesting Bytext.org approach, and what I proposed
    seems to have been thought also by them in their 8-bit encoding (which
    does not preserve the strict Unicode canonical equivalence, but seems to
    be created to preserve the Hangul script structure...

    Converting a Hangul text coded with the Bytext.org encoding to Unicode
    would certainly face the design choice in the mapper to whever or not
    using compatibility Hangul syllables...

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Wed Dec 03 2003 - 18:07:33 EST