RE: NFC FAQ

From: Phillips, Addison (addison@amazon.com)
Date: Sun Feb 22 2009 - 23:18:16 CST

  • Next message: Michael D. Adams: "Re: NFC FAQ"

    The Mac thing is overblown. Macs use NFD in their filesystems--- but they don't generate any more non-NFC content than any other system in file content. So Mark's data is entirely reasonable and within expectations... for the Web as a whole.

    There are languages for which "0.02%" is not a useful metric (hence the whole impetus for a FAQ). But as an overall measure, it's not a surprising number. Note that languages for which non-normalized data is likely to appear are also likely to be disadvantaged languages with *comparatively* small presence on the Internet today.

    Addison

    Addison Phillips
    Globalization Architect -- Lab126

    Internationalization is not a feature.
    It is an architecture.

    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    > On Behalf Of Doug Ewell
    > Sent: Sunday, February 22, 2009 6:13 PM
    > To: Unicode Mailing List
    > Subject: Re: NFC FAQ
    >
    > Mark Davis wrote:
    >
    > > an illustrative sample simulating documents would be
    > >
    > > simulating content:
    > >
    > > 999,800 characters (82% being ASCII, then Cyrillic, Han, Arab,
    > other
    > > Latin, ...) not needing normalization, and
    > >
    > > 200 characters needing normalization,
    >
    > If you did happen to run into some data that started out in NFD --
    > say,
    > generated on a Mac -- you'd have a lot more than 0.02% of content
    > characters needing normalization.
    >
    > --
    > Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14
    > http://www.ewellic.org
    > http://www1.ietf.org/html.charters/ltru-charter.html
    > http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
    >



    This archive was generated by hypermail 2.1.5 : Sun Feb 22 2009 - 23:21:35 CST