RE: Transliteration

From: [email protected]
Date: Mon Mar 06 2000 - 16:48:39 EST

Next message: Harald Tveit Alvestrand: "Re: Rationale for U+10FFFF?"
Previous message: John Jenkins: "Re: Paper on the (misnaming) of the Han"
Maybe in reply to: [email protected]: "Transliteration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

You can see (or checkout) the data files online:

at http://www10.software.ibm.com/developerworks/opensource/cvs/icu

For example, the data file for the Greek transliteration rules and basic
locale data are at the following two locations:

http://www10.software.ibm.com/developerworks/opensource/cvs/~checkout~/icu/data/translit/lgreek.txt?rev=1.2&content-type=text/plain
http://www10.software.ibm.com/developerworks/opensource/cvs/~checkout~/icu/data/el.txt?rev=1.3&content-type=text/plain

(if your emailer wraps these lines you might have to reconstruct the URLs.)

The source format for the locale data is not XML, having predated it.
However, it would be a trivial matter to convert it, and we are looking at
using XML for the source format in the future.

Mark
___
Mark Davis, IBM Center for Java Technology, Cupertino
(408) 777-5850 [fax: 5891], [email protected], [email protected]
http://maps.yahoo.com/py/maps.py?Pyt=Tmap&addr=10275+N.+De+Anza&csz=95014

[email protected] on 2000.03.06 13:03:57

To: Mark Davis/Cupertino/IBM@IBMUS
cc:
Subject: RE: Transliteration

(this message has little to do with Nokia...)

I haven't yet looked closely at ICU but it certainly looks very
interesting,
mostly because I've been contemplating doing something very similar myself.

My arena, however, is Perl (I am one of the core developers of Perl). What
I had planned doing was of course something slightly less ambitious, and
(I think, I must say "I think" because as I said I still haven't looked
at ICU, only reads its docs and tried out the locale browser) more modular,
in that I would have had a separate Perl module that would have held only
the names of the weekdays and months (in UTF8) for the various languages,
and that would have been completely separate from, say, a collation module.
(How would I have received the data? Well, manually and by
contributions...
it would have been a long slow project.)

Now I'm interested in what kind of a format is the ICU data represented?
If the datafiles were XML in UTF8, well, they could be easily used from
any programming language that can parse UTF8 XML. Whether they would
prefer using some native binary compact databases, well, that is their
concern. But having the datafiles (plus rules like the transliteration
rules) easily available and separate from the APIs would be great.

Next message: Harald Tveit Alvestrand: "Re: Rationale for U+10FFFF?"
Previous message: John Jenkins: "Re: Paper on the (misnaming) of the Han"
Maybe in reply to: [email protected]: "Transliteration"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT