Re: UnicodeData.txt is invalid, flawed, broken, corrupt and wrong

From: Aki Inoue (
Date: Sat Jun 11 2005 - 15:09:22 CDT

  • Next message: Theodore H. Smith: "Re: UnicodeData.txt is invalid, flawed, broken, corrupt and wrong"


    According to the Unicode code chart KELVIN SIGN U212A has canonically
    decomposition mapping to Latin K U004B so the Unicode database is

    Note, in Normalization Form C processing, you don't map single
    character canonical mappings such as KELVIN SIGN or ANGSTROM SIGN.


    > No one from the official company replied to me last
    > time, so I'll try again.
    > Why is it that the entry for Kelvin (a measurement of temperature),
    > has a decomposition, which is listed as a canonical decomposition,
    > to the standard ASCII "K"?
    > This decomposition is actually a compatibility decomposition.
    > How does this cause me problems? I've written a parser for
    > UnicodeData.txt. This parser will extract data for decomposition,
    > and for composition also.
    > Because Kelvin canonically decomposes to K, it follows that K
    > cannonically composes to Kelvin! :o(
    > So my composer will change a word like this: "Kitchen", into
    > "(Kelvin)itchen". Which is just totally wrong. All because
    > UnicodeData.txt is broken.
    > That is what I think. But I might be wrong.
    > Can someone from please confirm or deny all of this?
    > That will put my mind at rest, because I need the official answer.
    > --
    > Industrial strength string processing,
    > made easy.
    > "All things are logical. Putting free-will in the slot for premises in
    > a logical system, makes all of life both understandable, and free."

    This archive was generated by hypermail 2.1.5 : Sat Jun 11 2005 - 15:10:53 CDT