From: Markus Scherer (email@example.com)
Date: Thu Jul 24 2003 - 14:18:43 EDT
There are many codepages for Indic languages.
Modern systems support Unicode. It is what Windows and MacOS X and Java and modern web browsers etc.
use internally - everything else is supported via conversion, which can be problematic.
The ISCII standard is byte-based and stateful. (Complicated and not widely supported.) It has switch
commands to go between the Indic scripts, and it also has commands for fancy-text attributes like
"bold". The latter cannot be handled in plain-text, general-purpose codepage conversion, of course.
When there are multiple names or codepage numbers for ISCII, that _should_ only be to set the
default script for conversion from ISCII to Unicode. ISCII text can contain a mix of Indic scripts
by announcing each change between script runs. The script should be announced before the first Indic
character appears in the ISCII text.
One problem with such complex encodings and converters is that two implementations will rarely yield
the same results, and that it is hard to document the behavior precisely.
For completeness sake, there are dozens of Indic "font encodings", i.e., someone has drawn a font
that maps byte values to glyphs. These things are not interoperable at all. Avoid them.
Summary: Use Unicode.
Philippe Verdy wrote:
> There are also errors in IBM ICU/Openi18n resources ...
If there are errors, then please submit a bug report. If possible, please include references to
authoritative material and a patch.
This archive was generated by hypermail 2.1.5 : Thu Jul 24 2003 - 14:57:17 EDT