RE: Compression through normalization

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 03 2003 - 12:55:53 EST

Next message: Edward H. Trager: "Re: Free Fonts"

Previous message: Michael Everson: "Re: MS Windows and Unicode 4.0 ?"
In reply to: Jungshik Shin: "RE: Compression through normalization"
Next in thread: Philippe Verdy: "RE: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> De : Jungshik Shin [mailto:jshin@mailaps.org]
> Note that Korean syllables in Unicode are NOT "LVT?" as you
> seem to think
I did not say that...

> BUT "L+V+T*" with '+', '*' and '?' have usual RE meaning.

I said this:
( ((L* V* VT T*) - (L* V+ T)) | X )*

> Who said that? 11,172 precomposed syllables are both *redundant*
> (should have never been encoded) and *incomplete* even for modern Korean
> text. I prefer to use Korean letters (in U+1100 block) for every single
> syllables of Korean, modern or not. We do need U+115F followed by 'V+T*'
> in modern Korean text in dictionaries, grammar books and lingustics text.

OK this choseong filler makes sense for vowel starting syllables, to make
them appear as if it was a L+V+T form. I still doubt that this is really
needed (unless the intent is to detach the vowel from a possible previous
trailing consonnant in <L0,V0,T0>, and not form a ligature with it where
<L0,V0,T0,V1,T1> would be composed as <L0+V0>,<T0+V1+T1> where T0 is
converted to a leading consonnant.

> Come on!!! We do not want to encode any more precomposed syllables.
> Encoding 11,172 of them already ranks top in the list of things we'd
> have done differently. Adding 567 more would NEVER NEVER happen even if
> there's room for them.

What about the existing "compatibility Hangul syllables" starting with
vowels ? Are they really distinct from the jamos that compose them, as
if they were decomposed to a leading choseong filler, a vowel and a
consonnant ? What would happen if a compressor chose to compress
occurences of <LF,V,T> to these compatibility vowel-starting syllables
by using a mapping to an internal charset, and reversed the compression
back to separate Lf, V, T in Unicode?

I've just read the interesting Bytext.org approach, and what I proposed
seems to have been thought also by them in their 8-bit encoding (which
does not preserve the strict Unicode canonical equivalence, but seems to
be created to preserve the Hangul script structure...

Converting a Hangul text coded with the Bytext.org encoding to Unicode
would certainly face the design choice in the mapper to whever or not
using compatibility Hangul syllables...

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: Edward H. Trager: "Re: Free Fonts"
Previous message: Michael Everson: "Re: MS Windows and Unicode 4.0 ?"
In reply to: Jungshik Shin: "RE: Compression through normalization"
Next in thread: Philippe Verdy: "RE: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 03 2003 - 18:07:33 EST