Re: Arabic country names

From: Edward H Trager (ehtrager@umich.edu)
Date: Fri Mar 21 2003 - 12:21:35 EST

  • Next message: Michael Everson: "RE: ANSI requires licence fees to use ISO language and country code?"

    OK, Frank,

    It took me a little while to remember where to find this kind of
    information, but now I've got it!

    You need to download IBM's very thorough "International Components for
    Unicode" library which is available under an Open Source license at:

    http://oss.software.ibm.com/icu/download/2.4/index.html

    In the "/source/data/locales" subdirectory of the distribution are a text
    files providing locale information for numerous locales. For each locale,
    there is a list, among other things, of the names of countries, spelled
    out fully, in that language/locale, referenced by the two-letter
    abbreviations. For the "ar.txt" file, there is a list of 18 countries,
    mainly Middle Eastern. For other languages, such as Thai ("th.txt"), the
    list of country names is much more extensive, so I assume that
    eventually the Arabic file will get updated too.

    The strings are in Java-style, ie: "EG { "\u0645\u0635\u0631" }". What I
    did to see them in a more human-readable form was to convert files that I
    wanted to look at into utf8 using "uniconv" from the Yudit distribution
    (www.yudit.org) and then use yudit to view the files:

    %> uniconv -in ar.txt -out ar.utf8 -decode java -encode utf-8
    %> yudit ar.utf8 &

    (This works on UNIX, Linux, and presumably on Cygwin under Windows too. Of
    course, I'm sure there are lots of other ways to view the files too).

    Hope this helps! It looks like ICU can serve as a nice data resource in
    general, even if you don't plan on using the C++ or Java libraries
    directly in software.

    On Thu, 20 Mar 2003, Frank da Cruz wrote:

    > It would seem timely to augment the collection of native-script
    > UTF-8 country names in:
    >
    > http://www.columbia.edu/kermit/postal.html#index
    >
    > with more Arabic ones. So far, Arabic is the most under-represented
    > script. I have a few (Egypt, Iran, Tajikistan) cribbed from Tex's page
    > but would like to fill in Afghanistan, Algeria, Djibouti, Iraq, Jordan,
    > Kuwait, Lebanon, Libya, Morocco, Oman, Pakistan, Syria, etc -- any country
    > whose name is written in Arabic script. Can anyone help with this?
    >
    > Thanks!
    >
    > - Frank
    >



    This archive was generated by hypermail 2.1.5 : Fri Mar 21 2003 - 13:01:24 EST