RE: outside decomposed, inside precomposed

From: Jon Hanna (jon@hackcraft.net)
Date: Wed Oct 13 2004 - 04:15:51 CST

Next message: Richard Cook: "Re: outside decomposed, inside precomposed"

Previous message: Doug Ewell: "Re: UTF-8 stress test file?"
In reply to: Richard Cook: "outside decomposed, inside precomposed"
Next in thread: Richard Cook: "Re: outside decomposed, inside precomposed"
Reply: Richard Cook: "Re: outside decomposed, inside precomposed"
Reply: Eric Muller: "Re: outside decomposed, inside precomposed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> imported UTF-8 sequences like [U+0065][U+0303] <e, tilde> get
> remapped
> internally to [U+1ebd] LATIN SMALL LETTER E WITH TILDE.
>
> Is this kind of behavior what one would expect?

That's conformant, if it causes problems with any other process (including
other processes that are part of the system in question) then that other
process isn't complying with conformance clause C9.

At a guess I'd say it's probably normalising to NFC which is advantageous in
a lot of ways (for example you should do this with data that has to conform
with the web's [draft] character model).

One of the clearest advantages is that it makes searching a lot more
efficient, as only one of the potentially very many canonically equivalent
sequences will have to be searched for (though case-insensitive and/or
diacritical-insensitive searches will still have many possible matching
strings).

On the other hand there are potential security risks with such
normalisation, and perhaps therefore it is something that should be
configurable.

> It's problematic (and buglike) for at least one reason: one needs to
> put all these precomposed things in one's font, or FileMaker doesn't
> display them properly.

That's were the problem lies, not in the normalisation.

> I'm assuming it will export the data in decomposed form ...
> but haven't
> actually tried that yet ...

I wouldn't assume anything of the sort. Normalising to NFD would be quite
unusual.

>
> BTW, this application supports import of UTF-8, but will not export
> UTF-8. That's odd, isn't it? It'll only export UTF-16 (it's internal
> storage form).

Odd indeed.

Regards,
Jon Hanna
<http://www.selkieweb.com/>

Next message: Richard Cook: "Re: outside decomposed, inside precomposed"
Previous message: Doug Ewell: "Re: UTF-8 stress test file?"
In reply to: Richard Cook: "outside decomposed, inside precomposed"
Next in thread: Richard Cook: "Re: outside decomposed, inside precomposed"
Reply: Richard Cook: "Re: outside decomposed, inside precomposed"
Reply: Eric Muller: "Re: outside decomposed, inside precomposed"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Oct 13 2004 - 04:19:58 CST