From: Jon Hanna (firstname.lastname@example.org)
Date: Wed Oct 13 2004 - 04:15:51 CST
> imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get
> internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
> Is this kind of behavior what one would expect?
That's conformant, if it causes problems with any other process (including
other processes that are part of the system in question) then that other
process isn't complying with conformance clause C9.
At a guess I'd say it's probably normalising to NFC which is advantageous in
a lot of ways (for example you should do this with data that has to conform
with the web's [draft] character model).
One of the clearest advantages is that it makes searching a lot more
efficient, as only one of the potentially very many canonically equivalent
sequences will have to be searched for (though case-insensitive and/or
diacritical-insensitive searches will still have many possible matching
On the other hand there are potential security risks with such
normalisation, and perhaps therefore it is something that should be
> It's problematic (and buglike) for at least one reason: one needs to
> put all these precomposed things in one's font, or FileMaker doesn't
> display them properly.
That's were the problem lies, not in the normalisation.
> I'm assuming it will export the data in decomposed form ...
> but haven't
> actually tried that yet ...
I wouldn't assume anything of the sort. Normalising to NFD would be quite
> BTW, this application supports import of UTF-8, but will not export
> UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal
> storage form).
This archive was generated by hypermail 2.1.5 : Wed Oct 13 2004 - 04:19:58 CST