Re: Usage stats? from Markus Scherer on 2015-03-27 (Unicode Mail List Archive)

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Fri, 27 Mar 2015 13:56:23 -0700

On Fri, Mar 27, 2015 at 1:27 PM, Michael Norton <
michaelanortonster_at_gmail.com> wrote:

> Easy example: what's the code for [blank space] U+020 across all language
> sets of Unicode? Is it the same ie: 100%?
>

I don't understand what you are asking, and I have a hunch you haven't said
it in a way that anyone else understands it either.

The code point value that the Unicode Standard assigns to the normal space
is U+0020, but
- not every language uses spaces
- not every language that uses spaces uses them for the same purpose as
English
- there are some 30 other "space" characters in Unicode

Statistics of character frequencies vary by corpus, as others have said.
Even if you "only" look "on the web", that's undefined until you specify a
crawling strategy. Dynamically generated content means that there is an
infinite number of "web pages". Every crawler will come up with a different
set.

Maybe you are asking about statistics of character encodings? On the web?
Such as, Unicode vs. Shift-JIS vs. ISO 8859-2 etc.?

markus

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Fri Mar 27 2015 - 15:57:31 CDT

This archive was generated by hypermail 2.2.0 : Fri Mar 27 2015 - 15:57:31 CDT