Re: Code Pages!

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Jul 24 2003 - 14:18:43 EDT

Next message: Peter Kirk: "Re: Hebrew hataf vowels (was: About CGJ)"

Previous message: John Hudson: "Re: Hebrew hataf vowels (was: About CGJ)"
In reply to: Philippe Verdy: "Re: Code Pages!"
Next in thread: rajesh@inflibnet.ac.in: "Re: Code Pages!"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

There are many codepages for Indic languages.

Modern systems support Unicode. It is what Windows and MacOS X and Java and modern web browsers etc.
use internally - everything else is supported via conversion, which can be problematic.

The ISCII standard is byte-based and stateful. (Complicated and not widely supported.) It has switch
commands to go between the Indic scripts, and it also has commands for fancy-text attributes like
"bold". The latter cannot be handled in plain-text, general-purpose codepage conversion, of course.

When there are multiple names or codepage numbers for ISCII, that _should_ only be to set the
default script for conversion from ISCII to Unicode. ISCII text can contain a mix of Indic scripts
by announcing each change between script runs. The script should be announced before the first Indic
character appears in the ISCII text.

One problem with such complex encodings and converters is that two implementations will rarely yield
the same results, and that it is hard to document the behavior precisely.

For completeness sake, there are dozens of Indic "font encodings", i.e., someone has drawn a font
that maps byte values to glyphs. These things are not interoperable at all. Avoid them.

Summary: Use Unicode.

Philippe Verdy wrote:
> There are also errors in IBM ICU/Openi18n resources ...

If there are errors, then please submit a bug report. If possible, please include references to
authoritative material and a patch.

Best regards,
markus

Next message: Peter Kirk: "Re: Hebrew hataf vowels (was: About CGJ)"
Previous message: John Hudson: "Re: Hebrew hataf vowels (was: About CGJ)"
In reply to: Philippe Verdy: "Re: Code Pages!"
Next in thread: rajesh@inflibnet.ac.in: "Re: Code Pages!"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jul 24 2003 - 14:57:17 EDT