From: Richard Cook (firstname.lastname@example.org)
Date: Wed Oct 13 2004 - 09:11:11 CST
Thanks for your reply.
On Oct 13, 2004, at 3:15 AM, you wrote:
>> imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get
>> remapped internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
>> Is this kind of behavior what one would expect?
> That's conformant, if it causes problems with any other process
> other processes that are part of the system in question)
Like, for example, a rendering process?
> then that other
> process isn't complying with conformance clause C9.
> At a guess I'd say it's probably normalising to NFC which is
> advantageous in
> a lot of ways (for example you should do this with data that has to
> with the web's [draft] character model).
> One of the clearest advantages is that it makes searching a lot more
> efficient, as only one of the potentially very many canonically
> sequences will have to be searched for
> (though case-insensitive and/or
> diacritical-insensitive searches will still have many possible matching
> On the other hand there are potential security risks with such
> normalisation, and perhaps therefore it is something that should be
>> It's problematic (and buglike) for at least one reason: one needs to
>> put all these precomposed things in one's font, or FileMaker doesn't
>> display them properly.
> That's were the problem lies, not in the normalisation.
Maybe they ought to be rendering the glyphs according to the characters
in the font, with a fallback via decomposition. If they normalize and
simply throw up the missing character empty box, this is not very
I built a tidy IPA transcription font, lacking many precomposed things.
Importing and exporting a data subset in FM7 reveals a total of 113
characters not displaying properly. This is annoying, to say the least.
One reason I wanted a *small* font is that in PDF generation big fonts
may not always be subsetted properly, and even a single page PDF will
end up embedding the whole font.
Also, there is extra overhead with a big font that seems to slow things
up a bit, even on a fast machine.
>> I'm assuming it will export the data in decomposed form ...
>> but haven't actually tried that yet ...
> I wouldn't assume anything of the sort. Normalising to NFD would be
Yes, I realize that now. And my test confirms that the internal
normalization is also what you get on export. And hence those 113 empty
>> BTW, this application supports import of UTF-8, but will not export
>> UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal
>> storage form).
> Odd indeed.
Well, maybe they're saving UTF-8 export for a future release ... though
I can't imagine why.
This archive was generated by hypermail 2.1.5 : Wed Oct 13 2004 - 09:13:57 CST