On Fri, Apr 13, 2001 at 11:32:16AM -0700, Markus Scherer wrote:
> It looks to me like the "Cp" names might be IBM CCSIDs. For those, have a look at the "ibm-" names in ICU's alias table at http://oss.software.ibm.com/cvs/icu/~checkout~/icu/data/convrtrs.txt
>
> Note that ICU uses "cp" to mean Microsoft codepage numbers.
>
> Note also that even IBM changes some of its tables over time and has in a few dozen cases multiple Unicode<->codepage tables per CCSID (see our entries for ibm-943 and ibm-1363).
>
> "Haphazard" is a good description of the situation...
> It is easy to have "repertoires" - the hard part is to have "one repertoire". The situation is beyond repair, although we (ICU) are still collecting and publishing data. Use Unicode, UTFs, SCSU.
>
> markus
>
> Mike Brown wrote:
> ...
> > I should not be surprised by your statement, but I am. It is distressing to
> > think that something that by definition should not be rocket science --
> > repertoires of abstract characters mapped directly to specific bit patterns
> > -- would be subject to such haphazard definition and even more haphazard
> > implementation.
The ISO charmap registry has unique naming of encodings, taht does not
change, and that is aligned with the IANA registry, See http://www.dkuug.dk/cultreg
Keld
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT