RE: outside decomposed, inside precomposed

From: Jon Hanna (
Date: Wed Oct 13 2004 - 04:15:51 CST

  • Next message: Richard Cook: "Re: outside decomposed, inside precomposed"

    > imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get
    > remapped
    > internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
    > Is this kind of behavior what one would expect?

    That's conformant, if it causes problems with any other process (including
    other processes that are part of the system in question) then that other
    process isn't complying with conformance clause C9.

    At a guess I'd say it's probably normalising to NFC which is advantageous in
    a lot of ways (for example you should do this with data that has to conform
    with the web's [draft] character model).

    One of the clearest advantages is that it makes searching a lot more
    efficient, as only one of the potentially very many canonically equivalent
    sequences will have to be searched for (though case-insensitive and/or
    diacritical-insensitive searches will still have many possible matching

    On the other hand there are potential security risks with such
    normalisation, and perhaps therefore it is something that should be

    > It's problematic (and buglike) for at least one reason: one needs to
    > put all these precomposed things in one's font, or FileMaker doesn't
    > display them properly.

    That's were the problem lies, not in the normalisation.

    > I'm assuming it will export the data in decomposed form ...
    > but haven't
    > actually tried that yet ...

    I wouldn't assume anything of the sort. Normalising to NFD would be quite

    > BTW, this application supports import of UTF-8, but will not export
    > UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal
    > storage form).

    Odd indeed.

    Jon Hanna

    This archive was generated by hypermail 2.1.5 : Wed Oct 13 2004 - 04:19:58 CST