From: Mark Davis (mark.davis@jtcsv.com)
Date: Wed Mar 09 2005 - 12:57:26 CST
On the plate for 4.1 is updating that code. Markus Scherer and Vladimir
Weinstein provided fixed code, but it slipped through the cracks in the
previous versions.
Mark
----- Original Message -----
From: "Elliotte Harold" <elharo@metalab.unc.edu>
To: "Unicode List" <unicode@unicode.org>
Sent: Wednesday, March 09, 2005 07:36
Subject: Sample code for NFC and Plane 1 characters
> I'm looking at the sample code for performing NFC (specifically
> recomposition) found at
> http://www.unicode.org/reports/tr15/Normalizer.html and my initial
> impression is that this isn't going to work for Unicode 4.0 because it's
> pretty much ignoring the issues of surrogate pairs. That is, it seems to
> be operating on Java chars rather than on Unicode code points.
>
> Am I missing something? Is this feasible? For instance, if there no
> characters from beyond the BMP were ever combined with anything in NFC,
> then one could simply return the surrogate pairs without ever
> recombining them. However, if there are any recompositions in Plane 1 or
> 2, then this sample code might need to be updated.
>
> Hmm, it looks like the decompose functions in the sample code also only
> operate on chars, not ints; and it's recently been pointed out here that
> some characters beyond the BMP do in fact decompose, so it really
> looks like this code is out of date. Is there any chance it will be
> updated to properly handle surrogate pairs?
>
> --
> Elliotte Rusty Harold elharo@metalab.unc.edu
> XML in a Nutshell 3rd Edition Just Published!
> http://www.cafeconleche.org/books/xian3/
> http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
>
>
>
This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 12:58:39 CST