From: Don Osborn (firstname.lastname@example.org)
Date: Wed Jul 26 2006 - 15:36:53 CDT
Hi Daniel, There was a workshop in Bamako last month sponsored by the
African Academy of Languages, the Language Observatory, and the Japan
Science and Technology Agency (see
http://gii2.nagaokaut.ac.jp/giiblog/blog/lopdiary.php?itemid=715 ) which
dealt with surveying African languages on the web. One of the things they
did, according to Tunde Adegbola (who was there and you will recall from the
Casablanca workshop last year) was to introduce something called the
Language Identification Module (LIM).
This might answer the first question. I'll cc Tunde and I need to write
Yoshiki Mikami and Shigeaki Kodama about it anyway. I'll also cc A12n-forum
where this subject came up before - in the interests of broadening the info
& dialogue on what sounds to be a project of wider interest that has had
relatively little attention.
All the best.
PanAfrican Localisation Project
----- Original Message -----
From: "Daniel Yacob" <email@example.com>
Sent: Wednesday, July 26, 2006 1:01 PM
Subject: Wanted: An Internet Unicode Meter
> I was asked twice within a week recently how many Amharic documents
> were on the internet and I could only guess at a figure. So it
> dawned on me that it would be a nice service if search engine
> companies could provide some statistics -based on language (if
> identified) and script. Perhaps these stats are available and
> I just wasn't able to find them?
> Going a step further, stats on a per character basis, or even a
> property basis would be useful and not just academically interesting.
> The practical application that comes to mind would be as a survey
> of Unicode usage. Under-utilized blocks, even dead zones, could be
> identified which would indicate where community outreach was needed.
> I think this would be in the Unicode Consortium's best interest to
> be aware of these stats (as well as related stats such as Unicode
> use vs other encoding systems and growth over time) to then know
> where to focus efforts in promoting adoption of the standard.
> So if the Unicode Consortium could work on a character meter with a
> major indexing/searching service, such as Google for example, that
> would be dandy. Do we know anyone at that intersection? ;-)
This archive was generated by hypermail 2.1.5 : Wed Jul 26 2006 - 15:41:35 CDT