Re: Normalization rate on the Web

From: Andrew Cunningham <lang.support_at_gmail.com>
Date: Tue, 22 Jan 2013 11:47:10 +1100

Hi Denis,

A fea thoughts ... library data may be nfc or nfd, but is more likely to
conform to the MARC character repetoire, so isn't exactly NFD.

Vietnamese data is either 1) NFC or 2) neither NFC nor NFD

It would be rare to find vietnamese data in NFD

For a range of afrjcan languages, maily ones uskng diacriti s anx diacritic
stackkng, it may be 1) NFC, 2) NFD or 3) niether NFC nor NFD depending on
the input framework used.
On Jan 22, 2013 3:26 AM, "Denis Jacquerye" <moyogo_at_gmail.com> wrote:

> Does anybody have any idea of how much of the Web is normalized in NFC
> or NFD? Or how much not normalized?
>
> How would one find out or try to make a smart guess?
>
> I know a lot of library catalogue data is in NFD or somewhat
> decomposed. Is there any other field that heavily uses decomposition?
>
> --
> Denis Moyogo Jacquerye
> African Network for Localisation http://www.africanlocalisation.net/
> Nkótá ya Kongó míbalé --- http://info-langues-congo.1sd.org/
> DejaVu fonts --- http://www.dejavu-fonts.org/
>
>
>
Received on Mon Jan 21 2013 - 18:49:25 CST

This archive was generated by hypermail 2.2.0 : Mon Jan 21 2013 - 18:49:25 CST