L2/01-434

Apple Comments on UTR #22

Peter Edberg, Deborah Goldsmith
October 31, 2001

Apple would like to draw attention to the following issues with the 
character mapping language described in UTR #22. We would like to adopt 
the language described there, but have encountered the following 
problems:

1. No way to make mappings dependent on contextual information

When mapping from Unicode to other character sets, the mapping can 
depend on:

A. The resolved direction of a particular character (examples: Unicode 
to MacArabic, MacHebrew)

B. The contextual form linking context for a character (example: Unicode 
to DOS Arabic or other presentation-form encodings)

Mappings to Unicode can also depend on context. For example, loose 
mappings from MacArabic perform an analysis of number context to 
determine whether to map to the Unicode 0030-39 digits or the Unicode 
0660-69 digits. Mappings from Indic encodings do some special context 
handling with virama.

2. Limited fallback level support

UTR 22 just has, in effect, strict mappings and fallback mappings. 
Apple's text encoding conversion engine distinguishes a third level, 
loose mappings (which are semantically correct but do not provide 
roundtrip capability).

3. No sharing of data between similar encodings

Unless we missed something, there does not seem to be a way in the UTR 
22 scheme to share the common mappings from multiple related encodings. 
For example, there are many Shift-JIS variants which add characters 
beyond the standard set. It would be nice to have the standard set in a 
table that can be shared among all the variants, each of which would 
then just supply a set of additional mappings (or perhaps override a few 
from the standard set). Perhaps this could be done as an enhancement to 
the versioning mechanism.