Re: PUAs and other charsets

From: Kenneth Whistler (
Date: Thu Feb 24 2000 - 22:14:08 EST

John Cowan asked:

> Should a coded character set with its own PUA define a mapping between
> its PUA and the Unicode PUA as part of its mapping to Unicode?
> The mappings on the Unicode FTP site don't, but I don't know why.

and Peter Constable replied:

> ?? How can the undefinable be defined? Or do you just mean,
> "map this block - whatever it happens to contain - onto that
> block"?

In principle, no mapping can be defined, since no standard characters
are defined there, and the mappings are intended as mapping of *characters*
to *characters*, not *code points* to *code points*.

However, there are some interesting exceptions in the implementation
of character mapping tables. When Unicode is used as a pivot for mapping
between legacy character sets, there are instances of intentional matches
between UDC (user-defined character) usages in the legacy character sets.
The most important such case comes in matched Asian character encodings
for EBCDIC and PC/Windows code pages. Some systems are designed so that
the UDC's will survive interchange between the host and the client systems,
converting back and forth from EBCDIC PUA and the PC PUA. IBM has a number
of code pages where the PUA's between such pairs are mutually conformant.
In such instances, a pivot conversion through Unicode may well want to
define a direct mapping for both legacy character sets into the Unicode
PUA. That way the pivot conversion can mimic the results of a direct
mapping between the legacy character sets.

Note, however, that such mappings of PUA's are the result of implementations
of mapping tables to deal with particular usages of vendor code pages.
It would be quite a different thing for the Unicode Consortium to
start generally suggesting particular mappings of PUA's into the
Unicode PUA. That smacks of attempting to standardize PUA usage -- which
the Unicode Consortium cannot and will not do.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT