This is a joint expert contribution by Erkki I. Kolehmainen of Finland
(SFS), Marc Wilhelm Küster of Germany (DIN) and Thorgeir Sigurdsson of
This contribution addresses the need to provide additional identification and specification facilities in the 10646-1. This need was clearly demonstrated in the initial handling of the Proposal to Add Lithuanian Accented Letters to ISO/IEC 10646 (SC2/WG2 N2075R) in the WG2 meeting of September 1999 in Copenhagen. This document is based on the assumption that new precomposed characters will not be added to the 10646 if they can be decomposed into existing characters, although this view is not necessarily endorsed by this document.
The 10646 is called the Universal Character Set for a reason, presumably because it is for all the characters of all the scripts of the world. Although the UCS still has a long way to go before it is universally implemented, the names and the identifiers used in the 10646 have already become widely used, authoritative, unambiguous references to the characters in a number of standards and related documentation.
If there is a need to identify a collection of e.g. all the characters in the Finnish, German or Icelandic alphabet, this can be done in the 10646 itself. In the case of e.g. the Lithuanian alphabet, there is presently no way to define a comparable list nor the composition of all the characters. In this particular case, it would be grossly unfair for a hitherto oppressed language trying to gain worldwide recognition and IT support.
We propose that an Annex be added to the 10646, where character names and composition identifiers and sequences are listed for characters that cannot be precomposed under the new rules, if they will be enforced. We further propose that a collection can be a mixture of both character and composition identifiers.
Erkki I. Kolehmainen
Marc Wilhelm Küster