L2/02-143

From: Randall K. Barry [mailto:rbar@loc.gov]
Sent: Friday, February 08, 2002 3:06 PM
To: Cathy Wissink
Cc: Sally McCallum; Patricia Harris; Michael Everson; Asmus Freytag;
John Jenkins; Joan Aliprand; Mike Ksar
Subject: Liaison report from ISO TC46/SC4/WG1


Cathy,

I had hoped to have had a document prepared for the meeting next week that
could serve as a kind of liaison report from ISO TC46/SC4/WG1 (Character
Sets) on the characters from TC46 sets which are not yet mappable to
Unicode/ISO-IEC 10646, but after re-examining the most recent
documentation I had on the problem, I realize that the task is much larger
than I had remembered.  There are still 103 characters in ISO TC46
bibliographic character sets which have no good mappings to Unicode.  This
does not take into account the 90 Glagolitic script characters which
should map substantially to a draft set planned for one of the other
planes (plane 2?).  The breakdown of the 103 characters which we have been
unable to map is as follows:

37 characters from ISO 5426-2 (Extended Latin: Part 2)
   Mostly unusual Latin letters used in obscure languages or
   in manuscripts

14 characters from ISO 6630 (Bibliographic Control Charactets)
   These are the most worrisome for the library community since in some
   cases there are control characters which are as important to us
   as some of the old ASCII controls in the 0001-001F range in Unicode

8  characters from ISO 8957 (Hebrew script)
   These are mostly less well known points or marks.

44 characters from ISO 10754 (Extended Cyrillic for Non-Slavic)
   These are almost exclusively Cyrillic script letters used in
   languages of the Asian part of the former Soviet Union, which
   during the last century modified existing Cyrillic letters to
   represent sounds that are not part of Slavic phonology.

ISO TC46/SC4 had originally asked for these characters to be included
in ISO-IEC 10646 and Unicode as compatibility characters.  Their
implementation is at best limited (and difficult to verify in all cases),
but should not be overlooked.  This is particularly true of characters
from ISO 6630, some of which are referenced in MARC, the widely used
standard for encoding bibliographic data.

Prior discussions of this outside of TC46 seem to have summarily dismissed
these remaining characters, requiring that TC46 apply the rigorous
provisions of the character/glyph model to them all.  Clearly, many of
these characters would not pass some of the tests set out in the
character/glyph model, but this is true for many characters already in
ISO-IEC 10646/Unicode which made it in as compatibility characters as
well.

Before going any further with the process of treating these characters one
by one, I was hoping you may be able to discuss the possibility of us
following the "compatibility character" route to addition.  I recognize
that in some cases a few character might HAVE to be rejected.  This,
however, should be the exception, not the rule, since all these characters
are in existing ISO standards which have been recently reaffirmed.

You help and consideration in this matter during your meeting next week
would be greatly appreciated by the member bodies of ISO TC46/SC4.


					Randall Keigan Barry
					Library of Congress
.