From: Mark Davis (email@example.com)
Date: Thu Feb 16 2006 - 18:59:26 CST
You have to be very careful. UnicodeData.txt is just one file of many
that contain the data for Unicode character properties. And in a great
many cases, the recommended property is *not* the one in Unicode data.
For examples of this, look at the following.
For other examples, such as determining whether letters are lowercase or
not, see "Case Conversion" in http://www.macchiato.com/slides/gotchas.html
(I'll be talking about these issues at the upcoming Unicode conference.)
Kit Peters wrote:
> On 2/16/06, *Jukka K. Korpela* <firstname.lastname@example.org
> <mailto:email@example.com>> wrote:
> On Wed, 15 Feb 2006, Kit Peters wrote:
> > I am interested in the characters whose properties are
> > defined in UnicodeData.txt.
> But do you really mean that? That is, do you mean Unicode characters
> except Han characters and Hangul syllables? Why would this be a
> relevant subset? If it is, I don't think there is any shorter
> you could use.
> The reason I am only interested right now in the characters from
> UnicodeData.txt is that is what the larger project I am working on
> (CLforJava, a pure Java Common Lisp implementation) only parses
> UnicodeData.txt. While eventually we plan to parse Unihan.txt, at the
> present time I am concentrating on parsing all the numbers in
> Besides, the formulation is vague.
> What would be a more accurate formulation?
> Kit Peters
This archive was generated by hypermail 2.1.5 : Thu Feb 16 2006 - 19:21:31 CST