Re: Proposing UTF-21/24

From: Addison Phillips (
Date: Tue Jan 23 2007 - 14:09:42 CST

  • Next message: Frank Ellermann: "Re: Proposing UTF-21/24"

    Marion wrote:

    > but the same kind of survey measurements taken in the earliest
    > years of Web activity would probably
    > yield closer to 100% ASCII, which would have been gravely
    > wrong and very misleading in real terms (that is, in terms
    > of real needs of real
    > users), so it would be, IMHO, better to ignore such statistics and
    > always return, as a rule of thumb, to user needs (as distinct to user
    > practice, which, in my experience, can often be no more than a
    > reflection of colonial imposition, as a culture strives to survive
    > against all the odds).

    If we were talking about the distribution of *language* text, I would
    agree. But the measurement of markup as a relation to actual text is
    different. HTML tags and other markup are not language-bearing text and
    they consistently form about half the overall "content" of the textual
    part (as opposed to graphics or music files and such) of the Web. If we
    omit the markup, the range and relative distribution of various scripts
    has evolved over time, away from early domination by Latin scripts
    towards a more "natural" distribution.

    So even if we all switched to cuneiform for writing our various
    languages, the total volume of the Web that used supplemental characters
    would only approach 50%, since half of the Web is angle brackets and
    such :-).

    Best Regards,


    Addison Phillips
    Globalization Architect -- Yahoo! Inc.
    Internationalization is an architecture.
    It is not a feature.

    This archive was generated by hypermail 2.1.5 : Tue Jan 23 2007 - 14:11:52 CST