Re: Wanted: An Internet Unicode Meter

From: Don Osborn (dzo@bisharat.net)
Date: Wed Jul 26 2006 - 15:36:53 CDT

  • Next message: Dean Harding: "RE: FW: Other Question, Problem, or Feedback"

    Hi Daniel, There was a workshop in Bamako last month sponsored by the
    African Academy of Languages, the Language Observatory, and the Japan
    Science and Technology Agency (see
    http://gii2.nagaokaut.ac.jp/giiblog/blog/lopdiary.php?itemid=715 ) which
    dealt with surveying African languages on the web. One of the things they
    did, according to Tunde Adegbola (who was there and you will recall from the
    Casablanca workshop last year) was to introduce something called the
    Language Identification Module (LIM).

    This might answer the first question. I'll cc Tunde and I need to write
    Yoshiki Mikami and Shigeaki Kodama about it anyway. I'll also cc A12n-forum
    where this subject came up before - in the interests of broadening the info
    & dialogue on what sounds to be a project of wider interest that has had
    relatively little attention.

    All the best.

    Don

    Don Osborn
    Bisharat.net
    PanAfrican Localisation Project

    ----- Original Message -----
    From: "Daniel Yacob" <unicode@geez.org>
    To: <unicode@unicode.org>
    Sent: Wednesday, July 26, 2006 1:01 PM
    Subject: Wanted: An Internet Unicode Meter

    > Greets,
    >
    > I was asked twice within a week recently how many Amharic documents
    > were on the internet and I could only guess at a figure. So it
    > dawned on me that it would be a nice service if search engine
    > companies could provide some statistics -based on language (if
    > identified) and script. Perhaps these stats are available and
    > I just wasn't able to find them?
    >
    > Going a step further, stats on a per character basis, or even a
    > property basis would be useful and not just academically interesting.
    > The practical application that comes to mind would be as a survey
    > of Unicode usage. Under-utilized blocks, even dead zones, could be
    > identified which would indicate where community outreach was needed.
    >
    > I think this would be in the Unicode Consortium's best interest to
    > be aware of these stats (as well as related stats such as Unicode
    > use vs other encoding systems and growth over time) to then know
    > where to focus efforts in promoting adoption of the standard.
    >
    > So if the Unicode Consortium could work on a character meter with a
    > major indexing/searching service, such as Google for example, that
    > would be dandy. Do we know anyone at that intersection? ;-)
    >
    > cheers,
    >
    > /Daniel
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Jul 26 2006 - 15:41:35 CDT