RE: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

From: Costello, Roger L. <costello_at_mitre.org>
Date: Sun, 3 Feb 2013 13:27:29 +0000

Hi Folks,

Thank you for your excellent responses.

Based on your responses, I now wonder why the W3C recommends NFC be used for text exchanges over the Internet. Aside from the size advantage of NFC, there seems to be tremendous advantages to using NFD:

- It’s easier to do searches and other text processing on NFD-encoded text.

- NFD makes the regular expressions used to qualify its contents much, *much* simpler.

- Things like fuzzy text matching are probably easier in NFD.

- It’s easier to remember a handful of useful composing accents than the much larger number of combined forms.

- It is easier to use a few keystrokes for combining accents than to set up compose key sequences for all the possible composed characters.

- Some Unicode-defined processes, such as capitalization, are not guaranteed to preserve normalization forms.

- Some operating systems store filenames in NFD encoding.

The W3C is currently updating their recommendations [1]:

    This version of this document was published to
    indicate the Internationalization Core Working
    Group's intention to substantially alter or replace
    the recommendations found here with very different
    recommendations in the near future.

Would you recommend that the W3C change their recommendation from:

    Use NFC when exchanging text over the Internet.

to:

    Use NFD when exchanging text over the Internet.

Would that be your recommendation to the W3C?

/Roger

[1] http://www.w3.org/TR/charmod-norm/
Received on Sun Feb 03 2013 - 07:31:55 CST

This archive was generated by hypermail 2.2.0 : Sun Feb 03 2013 - 07:31:56 CST