From: Mark Davis (email@example.com)
Date: Wed Mar 09 2005 - 12:57:26 CST
On the plate for 4.1 is updating that code. Markus Scherer and Vladimir
Weinstein provided fixed code, but it slipped through the cracks in the
----- Original Message -----
From: "Elliotte Harold" <firstname.lastname@example.org>
To: "Unicode List" <email@example.com>
Sent: Wednesday, March 09, 2005 07:36
Subject: Sample code for NFC and Plane 1 characters
> I'm looking at the sample code for performing NFC (specifically
> recomposition) found at
> http://www.unicode.org/reports/tr15/Normalizer.html and my initial
> impression is that this isn't going to work for Unicode 4.0 because it's
> pretty much ignoring the issues of surrogate pairs. That is, it seems to
> be operating on Java chars rather than on Unicode code points.
> Am I missing something? Is this feasible? For instance, if there no
> characters from beyond the BMP were ever combined with anything in NFC,
> then one could simply return the surrogate pairs without ever
> recombining them. However, if there are any recompositions in Plane 1 or
> 2, then this sample code might need to be updated.
> Hmm, it looks like the decompose functions in the sample code also only
> operate on chars, not ints; and it's recently been pointed out here that
> some characters beyond the BMP do in fact decompose, so it really
> looks like this code is out of date. Is there any chance it will be
> updated to properly handle surrogate pairs?
> Elliotte Rusty Harold firstname.lastname@example.org
> XML in a Nutshell 3rd Edition Just Published!
This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 12:58:39 CST