Re: Text in composed normalized form is king, right? Does anyone generate text in decomposed normalized form?

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Mon, 04 Feb 2013 20:16:47 +0900

Hello Roger,

The conclusion to your question below is a very clear NO. The reason is
that most text is already in NFC. In fact, as I wrote a few days or
weeks ago, NFC was defined to capture what's usually around on the Web
(and in other places, too). Trying to recommend that everything be in
NFD when more than 99% is already in NFC, and that won't change any time
soon, just doesn't make sense.

Also, most of the statements you have below need more qualifiers. For
example, only a very, very small minority of people ever needs to input
"all" possible composed characters (and on top of that, some clever
software can do the normalization to NFC while the input in happening).

Regards, Martin.

On 2013/02/03 22:27, Costello, Roger L. wrote:
> Hi Folks,
>
> Thank you for your excellent responses.
>
> Based on your responses, I now wonder why the W3C recommends NFC be used for text exchanges over the Internet. Aside from the size advantage of NFC, there seems to be tremendous advantages to using NFD:
>
> - It’s easier to do searches and other text processing on NFD-encoded text.
>
> - NFD makes the regular expressions used to qualify its contents much, *much* simpler.
>
> - Things like fuzzy text matching are probably easier in NFD.
>
> - It’s easier to remember a handful of useful composing accents than the much larger number of combined forms.
>
> - It is easier to use a few keystrokes for combining accents than to set up compose key sequences for all the possible composed characters.
>
> - Some Unicode-defined processes, such as capitalization, are not guaranteed to preserve normalization forms.
>
> - Some operating systems store filenames in NFD encoding.
>
> The W3C is currently updating their recommendations [1]:
>
> This version of this document was published to
> indicate the Internationalization Core Working
> Group's intention to substantially alter or replace
> the recommendations found here with very different
> recommendations in the near future.
>
> Would you recommend that the W3C change their recommendation from:
>
> Use NFC when exchanging text over the Internet.
>
> to:
>
> Use NFD when exchanging text over the Internet.
>
> Would that be your recommendation to the W3C?
>
> /Roger
>
> [1] http://www.w3.org/TR/charmod-norm/
>
>
>
>
Received on Mon Feb 04 2013 - 05:23:02 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 04 2013 - 05:23:07 CST