From: Don Osborn (dzo@bisharat.net)
Date: Thu Jul 27 2006 - 13:08:44 CDT
Thank you Debbie. I also found another document on LOP from a conference
last year at http://www2005.org/cdrom/docs/p990.pdf . It has some additional
specs.
----- Original Message -----
From: Debbie Garside
...
> Having had some contact with Professor Mikami, a brief outline to the best
> of my knowledge, the Language Observatory project (Nagaoka University -
> Japan) employs a UBI Crawler to trawl the web gathering information on
> language, scripts, encodings etc. The project aims to analyse this data
> providing stats on the coverage of the 300 languages as used in the
> Declaration of Human Rights, amongst other things.
>
> http://www.elda.org/en/proj/scalla/SCALLA2004/mikami.pdf
>
> Amharic is one of those languages.
>
> Regards
>
> Debbie Garside
>
>> -----Original Message-----
>> From: unicode-bounce@unicode.org
>> [mailto:unicode-bounce@unicode.org] On Behalf Of Don Osborn
>> Sent: 26 July 2006 21:37
>> To: unicode@unicode.org; Daniel Yacob
>> Cc: Tunde Adegbola; a12n-forum@bisharat.net
>> Subject: Re: Wanted: An Internet Unicode Meter
>>
>> Hi Daniel, There was a workshop in Bamako last month
>> sponsored by the African Academy of Languages, the Language
>> Observatory, and the Japan Science and Technology Agency (see
>> http://gii2.nagaokaut.ac.jp/giiblog/blog/lopdiary.php?itemid=7
>> 15 ) which dealt with surveying African languages on the web.
>> One of the things they did, according to Tunde Adegbola (who
>> was there and you will recall from the Casablanca workshop
>> last year) was to introduce something called the Language
>> Identification Module (LIM).
>>
>> This might answer the first question. I'll cc Tunde and I
>> need to write Yoshiki Mikami and Shigeaki Kodama about it
>> anyway. I'll also cc A12n-forum where this subject came up
>> before - in the interests of broadening the info & dialogue
>> on what sounds to be a project of wider interest that has had
>> relatively little attention.
>>
>> All the best.
>>
>> Don
>>
>> Don Osborn
>> Bisharat.net
>> PanAfrican Localisation Project
>>
>>
>> ----- Original Message -----
>> From: "Daniel Yacob" <unicode@geez.org>
>> To: <unicode@unicode.org>
>> Sent: Wednesday, July 26, 2006 1:01 PM
>> Subject: Wanted: An Internet Unicode Meter
>>
>>
>> > Greets,
>> >
>> > I was asked twice within a week recently how many Amharic documents
>> > were on the internet and I could only guess at a figure. So it
>> > dawned on me that it would be a nice service if search engine
>> > companies could provide some statistics -based on language (if
>> > identified) and script. Perhaps these stats are available and
>> > I just wasn't able to find them?
>> >
>> > Going a step further, stats on a per character basis, or even a
>> > property basis would be useful and not just academically
>> interesting.
>> > The practical application that comes to mind would be as a survey
>> > of Unicode usage. Under-utilized blocks, even dead zones, could be
>> > identified which would indicate where community outreach was needed.
>> >
>> > I think this would be in the Unicode Consortium's best interest to
>> > be aware of these stats (as well as related stats such as Unicode
>> > use vs other encoding systems and growth over time) to then know
>> > where to focus efforts in promoting adoption of the standard.
>> >
>> > So if the Unicode Consortium could work on a character meter with a
>> > major indexing/searching service, such as Google for example, that
>> > would be dandy. Do we know anyone at that intersection? ;-)
>> >
>> > cheers,
>> >
>> > /Daniel
>> >
>> >
>>
>>
>>
>>
>
>
>
This archive was generated by hypermail 2.1.5 : Thu Jul 27 2006 - 14:01:19 CDT