Re: Small Java implementation of NFC

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Fri Mar 04 2005 - 09:39:21 CST

  • Next message: Jeroen Ruigrok/asmodai: "Re: Unicode Stability (Was: Re: E0000 Language Tags for Some Obscure Languages)"

    But you do have to DEcompose them, right? That is, NFC(x) is the same as
    NFD(x), if x was added after Unicode 3.0. I mean, you can't just ignore them
    altogether.

    Is that right?
    Jill

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]On
    Behalf Of Andrew C. West
    Sent: 04 March 2005 14:23
    To: unicode@unicode.org
    Cc: elharo@metalab.unc.edu
    Subject: Re: Small Java implementation of NFC

    Elliotte Harold wrote:
    >
    > Are there any decomposable characters beyond the BMP?

    Yes, 13 musical symbols at 1D15E..1D164 and 1D1BB..1D1C0.

    > Or any characters that would need to be recomposed with other characters?

    But there are no characters beyond the BMP that will ever be recomposed using
    NFC.

    Unicode Standard Annex #15 (http://www.unicode.org/reports/tr15/) specifies
    that
    precomposed characters that are added after Unicode 3.0 are excluded from
    composition (i.e. not recomposed when NFC is applied to them). As all
    characters
    beyond the BMP were added in Unicode 3.1 or later, you can effectively ignore
    any character greater than U+FFFF (or any surrogates if you are processing
    UTF-16) when applying NFC to a text stream.

    Andrew



    This archive was generated by hypermail 2.1.5 : Fri Mar 04 2005 - 11:14:38 CST