From: John Cowan (
Date: Wed Mar 01 2000 - 12:42:26 EST

Kenneth Whistler wrote:

> The raw figures are posted below.


> These constitute the lumped sums from both the MUMS Books database and
> the JACKPHY database, containing 12,421,528 instances of characters with
> diacritics, out of a total of 1,492,948,727 Latin characters.

BTW, the JACKPHY database (IIRC) is bibliographic information (in Latin
alphabet transliteration) for books written in non-Latin scripts.
So it represents "non-native" uses of diacritics.

An interesting point about ANSEL is that it treats u-horn and o-horn
as unique letters like eth and ae, rather than as u and o with a
COMBINING HORN as Unicode does. Since HORN is not applied to any
other letters, I wonder why it was analyzed out by the Unicode
designers (only saved 3 codepoints).


