Re: UnicodeData.txt is invalid, flawed, broken, corrupt and wrong

From: Aki Inoue (aki@apple.com)
Date: Sat Jun 11 2005 - 15:09:22 CDT

  • Next message: Theodore H. Smith: "Re: UnicodeData.txt is invalid, flawed, broken, corrupt and wrong"

    Theodore,

    According to the Unicode code chart KELVIN SIGN U212A has canonically
    decomposition mapping to Latin K U004B so the Unicode database is
    correct.

    Note, in Normalization Form C processing, you don't map single
    character canonical mappings such as KELVIN SIGN or ANGSTROM SIGN.

    Aki

    >
    > No one from the official Unicode.org company replied to me last
    > time, so I'll try again.
    >
    > Why is it that the entry for Kelvin (a measurement of temperature),
    > has a decomposition, which is listed as a canonical decomposition,
    > to the standard ASCII "K"?
    >
    > This decomposition is actually a compatibility decomposition.
    >
    > How does this cause me problems? I've written a parser for
    > UnicodeData.txt. This parser will extract data for decomposition,
    > and for composition also.
    >
    > Because Kelvin canonically decomposes to K, it follows that K
    > cannonically composes to Kelvin! :o(
    >
    > So my composer will change a word like this: "Kitchen", into
    > "(Kelvin)itchen". Which is just totally wrong. All because
    > UnicodeData.txt is broken.
    >
    > That is what I think. But I might be wrong.
    >
    > Can someone from Unicode.org please confirm or deny all of this?
    > That will put my mind at rest, because I need the official answer.
    >
    > --
    > http://elfdata.com/plugin/ Industrial strength string processing,
    > made easy.
    >
    > "All things are logical. Putting free-will in the slot for premises in
    > a logical system, makes all of life both understandable, and free."
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sat Jun 11 2005 - 15:10:53 CDT