Re: Does Unicode 4.1 change NFC?

From: Peter Kirk (
Date: Mon Apr 04 2005 - 11:04:20 CST

  • Next message: Sinnathurai Srivas: "Re: Tamil Aytham and the role of Unicode names"

    On 04/04/2005 16:33, Marcin 'Qrczak' Kowalczyk wrote:

    >Peter Kirk <> writes:
    >>There is a serious danger of breaking existing implementations
    >>(especially those which only fully support the BMP) by introducing a
    >>BMP character which normalises to outside the BMP. For the BMP is now
    >>no longer a closed subset of Unicode, under operations like
    >>normalisation which existing implementations expected to find closed.
    >I had to change my implementation of normalization because of this
    >(my static tables of canonical decompositions use 16-bit entries for
    >BMP blocks), but it was not a big deal.

    Thank you for the confirmation that this has required a change to
    existing code. It wasn't a big deal for you because you knew and
    understood what was happening - and presumably because you already had
    some support for non-BMP characters. It may be a big deal for others who
    are less well informed, or for code which is in the field and not being
    maintained properly, as well as for the many existing implementations
    which do not support anything outside the BMP.

    >I'm more concerned with killing the myth that Unicode is a 16-bit
    >encoding than with that minor inconvenience.
    I agree that this myth needs killing, but is it really worth the risk of
    killing with it the millions of computers which have been programmed
    according to this myth?

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.9.1 - Release Date: 01/04/2005

    This archive was generated by hypermail 2.1.5 : Mon Apr 04 2005 - 11:04:56 CST