Sample code for NFC and Plane 1 characters

From: Elliotte Harold (elharo@metalab.unc.edu)
Date: Wed Mar 09 2005 - 09:36:31 CST

  • Next message: Dean Snyder: "Re: Encoded rendering instructions (was Unicode's Mandate)"

    I'm looking at the sample code for performing NFC (specifically
    recomposition) found at
    http://www.unicode.org/reports/tr15/Normalizer.html and my initial
    impression is that this isn't going to work for Unicode 4.0 because it's
    pretty much ignoring the issues of surrogate pairs. That is, it seems to
    be operating on Java chars rather than on Unicode code points.

    Am I missing something? Is this feasible? For instance, if there no
    characters from beyond the BMP were ever combined with anything in NFC,
    then one could simply return the surrogate pairs without ever
    recombining them. However, if there are any recompositions in Plane 1 or
    2, then this sample code might need to be updated.

    Hmm, it looks like the decompose functions in the sample code also only
    operate on chars, not ints; and it's recently been pointed out here that
      some characters beyond the BMP do in fact decompose, so it really
    looks like this code is out of date. Is there any chance it will be
    updated to properly handle surrogate pairs?

    -- 
    Elliotte Rusty Harold  elharo@metalab.unc.edu
    XML in a Nutshell 3rd Edition Just Published!
    http://www.cafeconleche.org/books/xian3/
    http://www.amazon.com/exec/obidos/ISBN=0596007647/cafeaulaitA/ref=nosim
    


    This archive was generated by hypermail 2.1.5 : Wed Mar 09 2005 - 09:37:48 CST