From: Mark Davis ☕ (email@example.com)
Date: Mon Feb 14 2011 - 12:51:38 CST
*There are many caveats with any such data gathering, so don't put too much
reliance on the following figures. *
The relative proportions of languages on the web have English declining but
still at well over ⅓, Chinese (S) growing to about ⅐, then come Japanese,
German, Russian, Spanish, Korean, French, Polish, Chinese (T), Arabic,
Portuguese, Italian, Turkish, Dutch, with others less than 1% each.
In the last years there has also been a bit more (relative) growth in
smaller languages. The distribution looks like the following:
1σ - top 6 languages
2σ - next 24 languages
3σ - next 37 languages
*— Il meglio è l’inimico del bene —*
On Mon, Feb 14, 2011 at 03:17, Marion Gunn <firstname.lastname@example.org> wrote:
> The most common letter in English text is "e". Does its high frequency on
> the web just confirm that most web content is still in English?
> Scríobh 14/02/2011 10:57, Charlie Ruland:
>> * Mark Davis ☕ [2011-02-14 03:26]:
>> As it turns out, when looking at HTML pages on the web (with a good-sized
>>> sample from work here at Google), SPACE is the most frequent character (by a
>>> huge margin). That is even true on Chinese pages, just because of the
>>> proportion of markup on pages.
>>> For those interested, the most frequent Alphabetic is 'e'.
>> I would be interested if I wanted to compress the entire contents of the
>> Web, but I don’t.
>> /— Il meglio è l’inimico del bene —/
>> /— La ragione è l’inimico del //sognatore//—/
> Marion Gunn * eGteo (Estab.1991)
> 27 Páirc an Fhéithlinn, Baile an
> Bhóthair, An Charraig Dhubh,
> Co. Átha Cliath, Éire/Ireland
> * email@example.com * firstname.lastname@example.org *
This archive was generated by hypermail 2.1.5 : Mon Feb 14 2011 - 12:54:20 CST