Re: Usage stats?

From: Doug Ewell <>
Date: Sat, 28 Mar 2015 10:52:35 -0600

Michael Norton wrote:

> Thanks Doug. I did not know there exists a representative sample of
> the world's text. :)

There is not, which was the point.

Thanks for reposting a private message back to the list, by the way. πŸ’’

> Your frequency chart is great. The average char appearance is 2.91%.
> Only 34% from your list exceed 10% of it. Therefore, U+0020 is the
> elephant in the room (ie. 15%.05% is far > 2.91%). In fact, it's
> almost >50% greater than the next most-appearing character.

Words in English are separated by spaces, and the average English word
is about 5 letters long. It follows that English text will contain a lot
of spaces. You can eyeball this.

> Only 34% from your list exceed 10% of the average percentile (2.9%).
> This is serendipitously common (eg. the Earth:Moon albedo ratio is
> .36). A relationship about motion and other natural properties and
> charactetristics among the local texts begin to emerge.


Doug Ewell | | Thornton, CO πŸ‡ΊπŸ‡Έ 
Unicode mailing list
Received on Sat Mar 28 2015 - 11:54:29 CDT

This archive was generated by hypermail 2.2.0 : Sat Mar 28 2015 - 11:54:30 CDT