From: Phillips, Addison (email@example.com)
Date: Sun Feb 22 2009 - 23:18:16 CST
The Mac thing is overblown. Macs use NFD in their filesystems--- but they don't generate any more non-NFC content than any other system in file content. So Mark's data is entirely reasonable and within expectations... for the Web as a whole.
There are languages for which "0.02%" is not a useful metric (hence the whole impetus for a FAQ). But as an overall measure, it's not a surprising number. Note that languages for which non-normalized data is likely to appear are also likely to be disadvantaged languages with *comparatively* small presence on the Internet today.
Globalization Architect -- Lab126
Internationalization is not a feature.
It is an architecture.
> -----Original Message-----
> From: firstname.lastname@example.org [mailto:email@example.com]
> On Behalf Of Doug Ewell
> Sent: Sunday, February 22, 2009 6:13 PM
> To: Unicode Mailing List
> Subject: Re: NFC FAQ
> Mark Davis wrote:
> > an illustrative sample simulating documents would be
> > simulating content:
> > 999,800 characters (82% being ASCII, then Cyrillic, Han, Arab,
> > Latin, ...) not needing normalization, and
> > 200 characters needing normalization,
> If you did happen to run into some data that started out in NFD --
> generated on a Mac -- you'd have a lot more than 0.02% of content
> characters needing normalization.
> Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
> http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Sun Feb 22 2009 - 23:21:35 CST