Re: Sample code for NFC and Plane 1 characters

From: Mark Davis (
Date: Wed Mar 09 2005 - 12:57:26 CST

  • Next message: Elliotte Harold: "Re: Sample code for NFC and Plane 1 characters"

    On the plate for 4.1 is updating that code. Markus Scherer and Vladimir
    Weinstein provided fixed code, but it slipped through the cracks in the
    previous versions.


    ----- Original Message -----
    From: "Elliotte Harold" <>
    To: "Unicode List" <>
    Sent: Wednesday, March 09, 2005 07:36
    Subject: Sample code for NFC and Plane 1 characters

    > I'm looking at the sample code for performing NFC (specifically
    > recomposition) found at
    > and my initial
    > impression is that this isn't going to work for Unicode 4.0 because it's
    > pretty much ignoring the issues of surrogate pairs. That is, it seems to
    > be operating on Java chars rather than on Unicode code points.
    > Am I missing something? Is this feasible? For instance, if there no
    > characters from beyond the BMP were ever combined with anything in NFC,
    > then one could simply return the surrogate pairs without ever
    > recombining them. However, if there are any recompositions in Plane 1 or
    > 2, then this sample code might need to be updated.
    > Hmm, it looks like the decompose functions in the sample code also only
    > operate on chars, not ints; and it's recently been pointed out here that
    > some characters beyond the BMP do in fact decompose, so it really
    > looks like this code is out of date. Is there any chance it will be
    > updated to properly handle surrogate pairs?
    > --
    > Elliotte Rusty Harold
    > XML in a Nutshell 3rd Edition Just Published!

    This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 12:58:39 CST