From: Elliotte Harold (elharo@metalab.unc.edu)
Date: Wed Mar 09 2005 - 09:36:31 CST
I'm looking at the sample code for performing NFC (specifically
recomposition) found at
http://www.unicode.org/reports/tr15/Normalizer.html and my initial
impression is that this isn't going to work for Unicode 4.0 because it's
pretty much ignoring the issues of surrogate pairs. That is, it seems to
be operating on Java chars rather than on Unicode code points.
Am I missing something? Is this feasible? For instance, if there no
characters from beyond the BMP were ever combined with anything in NFC,
then one could simply return the surrogate pairs without ever
recombining them. However, if there are any recompositions in Plane 1 or
2, then this sample code might need to be updated.
Hmm, it looks like the decompose functions in the sample code also only
operate on chars, not ints; and it's recently been pointed out here that
some characters beyond the BMP do in fact decompose, so it really
looks like this code is out of date. Is there any chance it will be
updated to properly handle surrogate pairs?
-- Elliotte Rusty Harold elharo@metalab.unc.edu XML in a Nutshell 3rd Edition Just Published! http://www.cafeconleche.org/books/xian3/ http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 09:37:48 CST