From: Philippe Verdy (email@example.com)
Date: Sat Jun 11 2005 - 15:03:48 CDT
----- Original Message -----
From: "Theodore H. Smith" <firstname.lastname@example.org>
To: "ecartis" <email@example.com>
Sent: Saturday, June 11, 2005 9:27 PM
Subject: UnicodeData.txt is invalid, flawed, broken, corrupt and wrong
> No one from the official Unicode.org company replied to me last time, so
> I'll try again.
> Why is it that the entry for Kelvin (a measurement of temperature), has a
> decomposition, which is listed as a canonical decomposition, to the
> standard ASCII "K"?
> This decomposition is actually a compatibility decomposition.
> How does this cause me problems? I've written a parser for
> UnicodeData.txt. This parser will extract data for decomposition, and for
> composition also.
> Because Kelvin canonically decomposes to K, it follows that K
> cannonically composes to Kelvin! :o(
> So my composer will change a word like this: "Kitchen", into "(Kelvin)
> itchen". Which is just totally wrong. All because UnicodeData.txt is
"Kitchen" will remain "Kitchen" in all normalized forms.
only "+265(Kelvin)" will eventually become "+265K".
The Kelvin symbol is a compatibility character (only encoded for round-trip
compatibility with legacy encodings) and normalizes to a normal K letter.
Because of its status of compatibility character, its use is already
No need to say that your subject line is extremely unrespectuous. Repeat it
again and all you'll get is another notice from the Unicode list moderator,
and may be private insults; but you won't get more help from others. Your
first introduction to this list is really a failure: Ask for information,
but please don't insult people just because you don't understand something.
UnicodeData.txt is NOT invalid, NOT flawed, NOT broken, NOT corrupt, and NOT
wrong. At least for the Kelvin symbol you indicate.
There may be issues for some languages, in rare characters, but the case of
the Kelvin symbol is wellknown and understood since long now: its canonical
"decomposition mapping" is not a decomposition because it is a "singleton".
Singleton decomposition mappings are NOT "recomposable".
So please, reread the specs, notably the Unicode Standard, and its annex
that completely document the normalized forms.
This archive was generated by hypermail 2.1.5 : Sat Jun 11 2005 - 16:32:23 CDT