L2/02-143 From: Randall K. Barry [mailto:rbar@loc.gov] Sent: Friday, February 08, 2002 3:06 PM To: Cathy Wissink Cc: Sally McCallum; Patricia Harris; Michael Everson; Asmus Freytag; John Jenkins; Joan Aliprand; Mike Ksar Subject: Liaison report from ISO TC46/SC4/WG1 Cathy, I had hoped to have had a document prepared for the meeting next week that could serve as a kind of liaison report from ISO TC46/SC4/WG1 (Character Sets) on the characters from TC46 sets which are not yet mappable to Unicode/ISO-IEC 10646, but after re-examining the most recent documentation I had on the problem, I realize that the task is much larger than I had remembered. There are still 103 characters in ISO TC46 bibliographic character sets which have no good mappings to Unicode. This does not take into account the 90 Glagolitic script characters which should map substantially to a draft set planned for one of the other planes (plane 2?). The breakdown of the 103 characters which we have been unable to map is as follows: 37 characters from ISO 5426-2 (Extended Latin: Part 2) Mostly unusual Latin letters used in obscure languages or in manuscripts 14 characters from ISO 6630 (Bibliographic Control Charactets) These are the most worrisome for the library community since in some cases there are control characters which are as important to us as some of the old ASCII controls in the 0001-001F range in Unicode 8 characters from ISO 8957 (Hebrew script) These are mostly less well known points or marks. 44 characters from ISO 10754 (Extended Cyrillic for Non-Slavic) These are almost exclusively Cyrillic script letters used in languages of the Asian part of the former Soviet Union, which during the last century modified existing Cyrillic letters to represent sounds that are not part of Slavic phonology. ISO TC46/SC4 had originally asked for these characters to be included in ISO-IEC 10646 and Unicode as compatibility characters. Their implementation is at best limited (and difficult to verify in all cases), but should not be overlooked. This is particularly true of characters from ISO 6630, some of which are referenced in MARC, the widely used standard for encoding bibliographic data. Prior discussions of this outside of TC46 seem to have summarily dismissed these remaining characters, requiring that TC46 apply the rigorous provisions of the character/glyph model to them all. Clearly, many of these characters would not pass some of the tests set out in the character/glyph model, but this is true for many characters already in ISO-IEC 10646/Unicode which made it in as compatibility characters as well. Before going any further with the process of treating these characters one by one, I was hoping you may be able to discuss the possibility of us following the "compatibility character" route to addition. I recognize that in some cases a few character might HAVE to be rejected. This, however, should be the exception, not the rule, since all these characters are in existing ISO standards which have been recently reaffirmed. You help and consideration in this matter during your meeting next week would be greatly appreciated by the member bodies of ISO TC46/SC4. Randall Keigan Barry Library of Congress .