RE: Compression through normalization

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Dec 04 2003 - 07:36:46 EST

Next message: Peter Kirk: "Re: MS Windows and Unicode 4.0 ?"

Previous message: Kent Karlsson: "RE: Compression through normalization"
In reply to: Kent Karlsson: "RE: Compression through normalization"
Next in thread: Kent Karlsson: "RE: Compression through normalization"
Reply: Kent Karlsson: "RE: Compression through normalization"
Reply: Doug Ewell: "Re: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Kent Karlsson writes:
> Philippe Verdy wrote:
>
> > I just have another question for Korean: many jamos are in fact
> > composed from other jamos: this is clearly visible both in their name
> > and in their composed glyph. What would be the linguistic impact of
> > decomposing them (not canonically!)? Do Korean really learn these
> > jamos without breaking them into their components? I think here
> > about SSANG (double) consonnants, or the initial Y or final E
> of some vowels...
> > Of couse I won't be able to use such decomposition in Unicode,
>
> Of course you, and anyone else, can. Just as well as one can use spell
> checkers/correctors, transform digits between scripts, do transcriptions,
> or any other kind of processing on Unicode texts. It cannot be part of
> normalisation, though. And I agree that in this case that is unfortunate,
> since the letter cluster jamos really consist of sequences of two or more
> letters each. Fortunately, the definition of Hangul syllable blocks need
> not be changed, as it works well with Hangul syllables as L+, V+, T*
> (where L, V, and T stand for single-letter jamos).

In fact the Unicode encoding of modern Hangul syllables is more
accurately:

(Ls|Lm)+ (Vs|Vm)+ (Ts|Tm)*

where Ls,Vs,Ts are single-letter L,V,T modern jamos
and Lm,Vm,Tm are multiple-letter L,V,T modern jamos

The idea is to allow decomposing Lm,Vm, or Tm into sequences of
Ls, Vs, or Ts using supplementary decompositions including for the
compatibility Hangul syllables.
So this will effectively produce syllables encoded only with
Ls+ Vs+ Ts*

Then to recompose them as much as possible to build Lm,Vm,Tm jamos,
and then reassemble them in either jahob syllables (LV or LVT),
or in some compatibility syllables (historic syllables starting
by vowels).

This process seems to match the Korean readers interpretation of
Hangul syllables, and matches the description in the N954.PDF
working document of JTC1/SC22/WG20.

At least it has the merit to allow unification of uncomposed SSANG
consonnants, or uncomposed Y or E vowels that may appear even within
a text using only modern jamos or johad syllables. It also simplifies
the preparation of Hangul texts for UCA.

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: Peter Kirk: "Re: MS Windows and Unicode 4.0 ?"
Previous message: Kent Karlsson: "RE: Compression through normalization"
In reply to: Kent Karlsson: "RE: Compression through normalization"
Next in thread: Kent Karlsson: "RE: Compression through normalization"
Reply: Kent Karlsson: "RE: Compression through normalization"
Reply: Doug Ewell: "Re: Compression through normalization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Dec 04 2003 - 08:24:35 EST