RE: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Nov 25 2003 - 12:02:08 EST

Next message: John Cowan: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"

Previous message: Philippe Verdy: "RE: Normalisation stability, was: Compression through normalization"
In reply to: John Cowan: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Next in thread: John Cowan: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Reply: John Cowan: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Reply: Doug Ewell: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Cowan writes:
> > You are, because the floodgates, while once open, have been closed by
> > normalization.
>
> Indeed, they were opened in Unicode 1.1, as a result of the merger with
> FDIS 10646; since then, only 46 characters with canonical decompositions
> have been added to Unicode (excepting compatibility ideographs, which
> are a special case).

In fact ISO10646 is to allow an easy one-to-one mapping from existing
standard coded character sets (CCS) and unified code points. Accepting
precomposed characters is then a necessity when there exists precomposed
characters in legacy CCS standard. But they are included only for
compatibility (exactly like for compatibility ideographs).

The question of Latin letters with two diacritics added in Latin Extension B
does not seem to respect this constraint, as it is not justifed in the
Vietnames VISCII standard that already does not contain characters with two
diacritics, but already composes them with two characters in the limited CCS
set.
I don't know why even ISO10646 would have needed them, unless there's some
Vietnamese DBCS standard that allows representing in a 94x94 matrix all
letters with two diacritics as well as Han ideographs used in Vietnamese. I
looked within the IBM database of charsets (CCS+CES), and could not find
such reference to such EUC-style DBCS. So was it because there was an
ongoing/unterminated DBCS standard for Vietnamese, working like GBK, SJIS or
KSC 5601 ?

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: John Cowan: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Previous message: Philippe Verdy: "RE: Normalisation stability, was: Compression through normalization"
In reply to: John Cowan: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Next in thread: John Cowan: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Reply: John Cowan: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Reply: Doug Ewell: "Re: Korean compression (was: Re: Ternary search trees for Unicode dictionaries)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 13:05:21 EST