Re: what exactly is UnicodeData.txt?

From: Mark Davis (mark.davis@icu-project.org)
Date: Thu Feb 16 2006 - 18:59:26 CST

  • Next message: Kit Peters: "Re: what exactly is UnicodeData.txt?"

    You have to be very careful. UnicodeData.txt is just one file of many
    that contain the data for Unicode character properties. And in a great
    many cases, the recommended property is *not* the one in Unicode data.
    For examples of this, look at the following.

    http://www.unicode.org/reports/tr18/#Compatibility_Properties

    http://www.unicode.org/reports/tr31/

    For other examples, such as determining whether letters are lowercase or
    not, see "Case Conversion" in http://www.macchiato.com/slides/gotchas.html

    (I'll be talking about these issues at the upcoming Unicode conference.)

    Mark

    Kit Peters wrote:
    >
    >
    > On 2/16/06, *Jukka K. Korpela* <jkorpela@cs.tut.fi
    > <mailto:jkorpela@cs.tut.fi>> wrote:
    >
    > On Wed, 15 Feb 2006, Kit Peters wrote:
    >
    > > I am interested in the characters whose properties are
    > > defined in UnicodeData.txt.
    >
    > But do you really mean that? That is, do you mean Unicode characters
    > except Han characters and Hangul syllables? Why would this be a
    > relevant subset? If it is, I don't think there is any shorter
    > expression
    > you could use.
    >
    >
    > The reason I am only interested right now in the characters from
    > UnicodeData.txt is that is what the larger project I am working on
    > (CLforJava, a pure Java Common Lisp implementation) only parses
    > UnicodeData.txt. While eventually we plan to parse Unihan.txt, at the
    > present time I am concentrating on parsing all the numbers in
    > UnicodeData.txt.
    >
    > Besides, the formulation is vague.
    >
    >
    > What would be a more accurate formulation?
    >
    > Kit Peters



    This archive was generated by hypermail 2.1.5 : Thu Feb 16 2006 - 19:21:31 CST