RE: Compression through normalization

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 03 2003 - 06:27:39 EST

Next message: Michael Everson: "Re: MS Windows and Unicode 4.0 ?"

Previous message: Michael Everson: "RE: MS Windows and Unicode 4.0 ?"
Maybe in reply to: Philippe Verdy: "RE: Compression through normalization"
Next in thread: Jungshik Shin: "RE: Compression through normalization"
Reply: Jungshik Shin: "RE: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Jungshik Shin writes:
> > > I already answered about it: I had mixed the letters TLV instead of
> > > LVT. All the above was correct if you swap the letters. So what I did
> > > really was to compose only VT but not LV nor LVT:
> > >
> > > ( ((L* V* VT T*) - (L* V+ T)) | X )*
> > >
> > > I did it by using a leading filler (U+110B) to represent VT as an LVT
> > > syllable...
> >
> > But U+110B isn't a filler, it's a real letter, IEUNG. If you want a
> > choseong filler, you have to use U+115F. IEUNG is not equivalent to a
> > filler and can't be used to construct a so-called "VT syllable." For
> > example, (U+1100 + U+C544) is not equal to U+AC00.
>
> Doug is right. Philippe appears to have been confused by the fact that
> phonetically U+110B IEUNG is 'null-consonant' (the place holder
> for syllables
> that begin with a vowel). In Unicode-sense, however, U+110B is not
> a filler but as genuine a letter as any other leading consonants are.

Oops. I should have read that part better. So my test was giving wrong
results (even if I knew it was not producing canonically equivalent strings
I thought it was safe by looking at the list of unicode names generated from
the compressor, because I don't know that language...)
I do need to reread chapter 11.4... Which allows composing 19 leading
consonnant jamos, 21 medial vowels jamos (399 johab syllables), and
optionally 27 trailing consonnants jamos (10773 johab syllables). Plus
section 3.12 for conforming conjoining behavior of jamos.

I knew that there was a choseong filler in the leading consonnants and
rechecking it, you're right that this is not U+110B but U+115F. I wonder if
there's a way to use it to encode a VT syllable separately from the leading
consonnant jamo that normally starts all modern Korean. I fear not, because
johab syllables can only start by a choseong in U+1100 to U+1112.

That's a place where the codecharts for Hangul jamos should exhibit more
precisely the 3 subsets of jamos usable for johab syllables, because I just
looked at the normative name of Hangul syllables to check my compression
attempt, and I did not see that I was in fact breaking the text by adding a
visible IEUNG. (It "may" be phonetically acceptable only if the vowel
encoded in the syllable is YE or YO or YU, but I'm not sure about it, and
you're right that this would break the normal orthograph of Korean words).

So until there are new VT "syllables" (this would require 21*27=567 code
points, but one cannot locate them after the existing hangul syllables now
after U+D7A3, because it would require a free area U+D7A4..U+D9D9 which is
used partly for high surrogates starting at U+D800) encoded with excluded
canonical decompositions for stability of decompositions, I fear that it's
impossible.

Now I wonder what is the exact role of the choseong filler U+115F in the
Hangul script except for allowing (not composable) VT syllables for foreign
or old words (starting by a vowel) and that can only be written with
separate jamos without forming a ligature with a possible previous leading
consonnant (terminating another word)...

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: Michael Everson: "Re: MS Windows and Unicode 4.0 ?"
Previous message: Michael Everson: "RE: MS Windows and Unicode 4.0 ?"
Maybe in reply to: Philippe Verdy: "RE: Compression through normalization"
Next in thread: Jungshik Shin: "RE: Compression through normalization"
Reply: Jungshik Shin: "RE: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 03 2003 - 07:05:11 EST