Re: what exactly is UnicodeData.txt?

From: Mark Davis (mark.davis@icu-project.org)
Date: Thu Feb 16 2006 - 18:59:26 CST

Next message: Kit Peters: "Re: what exactly is UnicodeData.txt?"

Previous message: Travis Griggs: "CLDR/LDML questions"
In reply to: Kit Peters: "Re: what exactly is UnicodeData.txt?"
Next in thread: Kit Peters: "Re: what exactly is UnicodeData.txt?"
Reply: Kit Peters: "Re: what exactly is UnicodeData.txt?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

You have to be very careful. UnicodeData.txt is just one file of many
that contain the data for Unicode character properties. And in a great
many cases, the recommended property is *not* the one in Unicode data.
For examples of this, look at the following.

http://www.unicode.org/reports/tr18/#Compatibility_Properties

http://www.unicode.org/reports/tr31/

For other examples, such as determining whether letters are lowercase or
not, see "Case Conversion" in http://www.macchiato.com/slides/gotchas.html

(I'll be talking about these issues at the upcoming Unicode conference.)

Mark

Kit Peters wrote:
>
>
> On 2/16/06, *Jukka K. Korpela* <jkorpela@cs.tut.fi
> <mailto:jkorpela@cs.tut.fi>> wrote:
>
> On Wed, 15 Feb 2006, Kit Peters wrote:
>
> > I am interested in the characters whose properties are
> > defined in UnicodeData.txt.
>
> But do you really mean that? That is, do you mean Unicode characters
> except Han characters and Hangul syllables? Why would this be a
> relevant subset? If it is, I don't think there is any shorter
> expression
> you could use.
>
>
> The reason I am only interested right now in the characters from
> UnicodeData.txt is that is what the larger project I am working on
> (CLforJava, a pure Java Common Lisp implementation) only parses
> UnicodeData.txt. While eventually we plan to parse Unihan.txt, at the
> present time I am concentrating on parsing all the numbers in
> UnicodeData.txt.
>
> Besides, the formulation is vague.
>
>
> What would be a more accurate formulation?
>
> Kit Peters

Next message: Kit Peters: "Re: what exactly is UnicodeData.txt?"
Previous message: Travis Griggs: "CLDR/LDML questions"
In reply to: Kit Peters: "Re: what exactly is UnicodeData.txt?"
Next in thread: Kit Peters: "Re: what exactly is UnicodeData.txt?"
Reply: Kit Peters: "Re: what exactly is UnicodeData.txt?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Feb 16 2006 - 19:21:31 CST