Re: what exactly is UnicodeData.txt?

From: Mark Davis (
Date: Thu Feb 16 2006 - 18:59:26 CST

  • Next message: Kit Peters: "Re: what exactly is UnicodeData.txt?"

    You have to be very careful. UnicodeData.txt is just one file of many
    that contain the data for Unicode character properties. And in a great
    many cases, the recommended property is *not* the one in Unicode data.
    For examples of this, look at the following.

    For other examples, such as determining whether letters are lowercase or
    not, see "Case Conversion" in

    (I'll be talking about these issues at the upcoming Unicode conference.)


    Kit Peters wrote:
    > On 2/16/06, *Jukka K. Korpela* <
    > <>> wrote:
    > On Wed, 15 Feb 2006, Kit Peters wrote:
    > > I am interested in the characters whose properties are
    > > defined in UnicodeData.txt.
    > But do you really mean that? That is, do you mean Unicode characters
    > except Han characters and Hangul syllables? Why would this be a
    > relevant subset? If it is, I don't think there is any shorter
    > expression
    > you could use.
    > The reason I am only interested right now in the characters from
    > UnicodeData.txt is that is what the larger project I am working on
    > (CLforJava, a pure Java Common Lisp implementation) only parses
    > UnicodeData.txt. While eventually we plan to parse Unihan.txt, at the
    > present time I am concentrating on parsing all the numbers in
    > UnicodeData.txt.
    > Besides, the formulation is vague.
    > What would be a more accurate formulation?
    > Kit Peters

    This archive was generated by hypermail 2.1.5 : Thu Feb 16 2006 - 19:21:31 CST