Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 4 Feb 2013 23:54:41 +0100

2013/2/3 Costello, Roger L. <costello_at_mitre.org>:
> - It is easier to use a few keystrokes for combining accents than to set up compose key sequences for all the possible composed characters.

But MOST texts using combining diacritics are written in languages for
which there already exists standard keyboards featuring combine keys
or dead keys. These standard keyboards are handy and do not require
lot of compose key sequences, and they are avialable by default "out
of the box" in almost all OSes.
All these keyboards will compose letters in precombined forms, which
are already NFC for most Latin/Cyrillic/Greek. The exception is
Vietnamese for which there are several competing keyboards allowing
inputs in various forms (not always normalized, but most often partly
precombined at least by pairs).
Hebrew is most often typed and encoded in precombined form (for the
most frequent diacritics, notably SIN and SHIN dots). Sometimes anyway
you'll find characters that are not in any standard normalization form
(neither NFC nor NFD).

So you cannot recommand any form. But if the W3C needs to update
something, it's to say that ALL forms that are canonically equivalent
should be treated equally. This means that it is to the recipient of
encoded documents to perform their own normalization.

But there's no recommandation about which normalization will be used
by the recipient : for rendering NFC is generally easier, for
searching NFD may be better but the more effective way for searching
is to use collation, which already includes a NFD step in the
standardized algotithm, but collation can also be written which does
not require a prior normalization when it is not needed). Unicode also
decribes some properties that allows processing with "fast
normalisation checks", this can be used for creating conforming
collators.
Received on Mon Feb 04 2013 - 17:00:23 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 04 2013 - 17:00:24 CST