L2/01-434 Apple Comments on UTR #22 Peter Edberg, Deborah Goldsmith October 31, 2001 Apple would like to draw attention to the following issues with the character mapping language described in UTR #22. We would like to adopt the language described there, but have encountered the following problems: 1. No way to make mappings dependent on contextual information When mapping from Unicode to other character sets, the mapping can depend on: A. The resolved direction of a particular character (examples: Unicode to MacArabic, MacHebrew) B. The contextual form linking context for a character (example: Unicode to DOS Arabic or other presentation-form encodings) Mappings to Unicode can also depend on context. For example, loose mappings from MacArabic perform an analysis of number context to determine whether to map to the Unicode 0030-39 digits or the Unicode 0660-69 digits. Mappings from Indic encodings do some special context handling with virama. 2. Limited fallback level support UTR 22 just has, in effect, strict mappings and fallback mappings. Apple's text encoding conversion engine distinguishes a third level, loose mappings (which are semantically correct but do not provide roundtrip capability). 3. No sharing of data between similar encodings Unless we missed something, there does not seem to be a way in the UTR 22 scheme to share the common mappings from multiple related encodings. For example, there are many Shift-JIS variants which add characters beyond the standard set. It would be nice to have the standard set in a table that can be shared among all the variants, each of which would then just supply a set of additional mappings (or perhaps override a few from the standard set). Perhaps this could be done as an enhancement to the versioning mechanism.