From: Don Osborn (firstname.lastname@example.org)
Date: Thu Jul 27 2006 - 13:08:44 CDT
Thank you Debbie. I also found another document on LOP from a conference
last year at http://www2005.org/cdrom/docs/p990.pdf . It has some additional
----- Original Message -----
From: Debbie Garside
> Having had some contact with Professor Mikami, a brief outline to the best
> of my knowledge, the Language Observatory project (Nagaoka University -
> Japan) employs a UBI Crawler to trawl the web gathering information on
> language, scripts, encodings etc. The project aims to analyse this data
> providing stats on the coverage of the 300 languages as used in the
> Declaration of Human Rights, amongst other things.
> Amharic is one of those languages.
> Debbie Garside
>> -----Original Message-----
>> From: email@example.com
>> [mailto:firstname.lastname@example.org] On Behalf Of Don Osborn
>> Sent: 26 July 2006 21:37
>> To: email@example.com; Daniel Yacob
>> Cc: Tunde Adegbola; firstname.lastname@example.org
>> Subject: Re: Wanted: An Internet Unicode Meter
>> Hi Daniel, There was a workshop in Bamako last month
>> sponsored by the African Academy of Languages, the Language
>> Observatory, and the Japan Science and Technology Agency (see
>> 15 ) which dealt with surveying African languages on the web.
>> One of the things they did, according to Tunde Adegbola (who
>> was there and you will recall from the Casablanca workshop
>> last year) was to introduce something called the Language
>> Identification Module (LIM).
>> This might answer the first question. I'll cc Tunde and I
>> need to write Yoshiki Mikami and Shigeaki Kodama about it
>> anyway. I'll also cc A12n-forum where this subject came up
>> before - in the interests of broadening the info & dialogue
>> on what sounds to be a project of wider interest that has had
>> relatively little attention.
>> All the best.
>> Don Osborn
>> PanAfrican Localisation Project
>> ----- Original Message -----
>> From: "Daniel Yacob" <email@example.com>
>> To: <firstname.lastname@example.org>
>> Sent: Wednesday, July 26, 2006 1:01 PM
>> Subject: Wanted: An Internet Unicode Meter
>> > Greets,
>> > I was asked twice within a week recently how many Amharic documents
>> > were on the internet and I could only guess at a figure. So it
>> > dawned on me that it would be a nice service if search engine
>> > companies could provide some statistics -based on language (if
>> > identified) and script. Perhaps these stats are available and
>> > I just wasn't able to find them?
>> > Going a step further, stats on a per character basis, or even a
>> > property basis would be useful and not just academically
>> > The practical application that comes to mind would be as a survey
>> > of Unicode usage. Under-utilized blocks, even dead zones, could be
>> > identified which would indicate where community outreach was needed.
>> > I think this would be in the Unicode Consortium's best interest to
>> > be aware of these stats (as well as related stats such as Unicode
>> > use vs other encoding systems and growth over time) to then know
>> > where to focus efforts in promoting adoption of the standard.
>> > So if the Unicode Consortium could work on a character meter with a
>> > major indexing/searching service, such as Google for example, that
>> > would be dandy. Do we know anyone at that intersection? ;-)
>> > cheers,
>> > /Daniel
This archive was generated by hypermail 2.1.5 : Thu Jul 27 2006 - 14:01:19 CDT