Question on Unicode-prevalence (general and for Cyrillic)

From: Deborah W. Anderson (dwanders@pacbell.net)
Date: Sun Mar 14 2004 - 15:25:36 EST

  • Next message: Peter Kirk: "Re: Question on Unicode-prevalence (general and for Cyrillic)"

    Two questions:

    1. Is there a way to determine the prevalence of Unicode in electronic file documents (vs. documents not in Unicode)? At least for the Web, has anyone done a statistical sampling to determine the percentage of Unicode-encoded webpages?

    2. A graduate student mentioned that it was her impression that most Cyrillic webpages (at least for Russian--her interest) are still not encoded in Unicode. (She is doing some research on the use of certain words in Russian and wanted to know how best to do the search.)
    Again: Has anyone looked into the situation with Cyrillic in terms of the percentage of Web documents in Unicode?

    With thanks,
    Debbie Anderson

    Deborah Anderson
    Researcher, Dept. of Linguistics
    UC Berkeley
    Email: dwanders@socrates.berkeley.edu
    or dwanders@pacbell.net
    Script Encoding Initiative: www.linguistics.berkeley.edu/~dwanders
     



    This archive was generated by hypermail 2.1.5 : Sun Mar 14 2004 - 16:11:39 EST