Re: Request - convert ISCII to Unicode

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Nov 21 2003 - 21:18:15 EST

  • Next message: Mustafa Jabbar: "How can I have OTF for MacOS"

    Unfortunately, charset names -- including IANA names -- are in general not
    well-defined, in the sense that
    - one can access a mapping table to/from Unicode/10646 for them
    - that mapping table is guaranteed to represent what a vendor actually does in
    conversion APIs.

    Thus, what we base our aliases on is a programmatic comparison of tables built
    by accessing public APIs, for a certain set of platforms (defined broadly). As
    you can imagine, this can take some effort, and depends on what machines we have
    easy access to. And as with anything that takes some substantial effort, we have
    to balance this against all the other features we could be doing (TNSTAAFL).

    People can see the charset conversions shipped by default on
    http://oss.software.ibm.com/cgi-bin/icu/convexp, which also provides charts of
    those conversions. (If someone wants to install ones we don't ship by default,
    they can be added, as Markus said).

    Mark
    __________________________________
    http://www.macchiato.com
    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Philippe Verdy" <verdy_p@wanadoo.fr>
    To: "Markus Scherer" <markus.scherer@jtcsv.com>
    Cc: "Unicode@Unicode.Org" <unicode@unicode.org>
    Sent: Fri, 2003 Nov 21 16:52
    Subject: RE: Request - convert ISCII to Unicode

    > De: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]De la
    > > part de Markus Scherer
    > > Envoye : samedi 22 novembre 2003 00:47
    > > A : unicode
    > > Objet : Re: Request - convert ISCII to Unicode
    > >
    > >
    > > Frank Yung-Fong Tang wrote:
    > > > Does the ICU ISCII convertesr take ATTRIBUTE code in ISCII (as defined
    > > > in ANNEX-E of ISCII 13194:1991, page 20 to swtich between script?)
    > > > ATR = 0xEF in ISCII
    > > > 0xEF 0x42 to switch to Devanagari script
    > > > 0xEF 0x43 to switch to Bengali script
    > > > etc...
    > >
    > > The ICU ISCII converter does handle the script-switching
    > > attributes. The default script (before
    > > encountering a script attribute) depends on the charset name you
    > > use (for example, iscii-dev vs.
    > > iscii-guj vs. iscii-tlg or x-iscii-te etc.) or can be set with an
    > > ICU-specific option suffix on the
    > > ISCII charset name itself.
    > >
    > > Search for "ISCII" in
    > >
    > http://oss.software.ibm.com/cvs/icu/~checkout~/icu/source/data/mappings/conv
    > rtrs.txt
    >
    > Just one subsidiary question about this ICU file: will there be support in
    > ICU of charset name aliases used in Oracle? How can we be sure of the
    > corresponding charset name aliases between Oracle, MIME, Java...
    >
    > I know that Oracle 8i and after implements a JDK-compilant Java VM, so this
    > alias mapping should be easy to perform with Java aliases, no?
    >
    > I have the same questions for Sybase (is it the same as MS SQL, i.e. based
    > on Windows codepages?)
    >
    > Could the ICU charsets converters support other well-known collections of
    > charset names and aliases beside these ones: UTR22, ICU, IBM, JAVA, WINDOWS,
    > GLIBC, AIX, DB2, SOLARIS, APPLE, HPUX, MIME, IANA, MSIE, ZOS_USS (MVS), ...?
    > For example VMS collections, and other RDBMS engines like MSSQL, SYBASE,
    > ORACLE, MYSQL (?)
    >
    >
    > __________________________________________________________________
    > << ella for Spam Control >> has removed Spam messages and set aside
    > Newsletters for me
    > You can use it too - and it's FREE! http://www.ellaforspam.com
    >



    This archive was generated by hypermail 2.1.5 : Fri Nov 21 2003 - 22:20:10 EST