20 May, 1994

I'd like to make two clarifications to the Andy Feibus column
that discussed Unicode in the April 25, 1994 edition of _Open
Systems Today_.

1. Mr. Feibus' statement, that "ISO incorporated this work
[Unicode] into ISO 10646 . . . ", while true in itself, misses
an important aspect of the history in the development of
ISO/IEC 10646-1:1993 and Unicode Version 1.1 because the ISO
10646 working group and the Unicode consortium agreed to merge
their respective codes into one code.

Let me give you some background.  In 1991, the ISO working
group developing 10646 and the Unicode consortium had the some
goal of developing a multi-byte code for the world's
characters but were proceeding along divergent paths.  In
particular, the coding philosophies and implementations were
incompatible so that conversion between the two codes would
be, at best, extremely difficult.  At the time, it was clear
(a) that several vendors were already writing Unicode support
into their software and (b) that many countries would require
compliance with the future 10646 international standard in
their procurements.  Consequently, many information system
vendors were deeply concerned about the expense of supporting
two incompatible multi-byte codes.

I represent the SHARE organization, which is an association of
IBM customers, to the U.S. X3L2 technical standards committee
for codes and character sets (for example, ASCII, X3.4:1986).
SHARE Members had too many difficulties with converting
between the 100-character 7-bit ASCII and 8-bit EBCDIC codes
to want to face the conversion problems inherent with two
multi-byte codes with characters sets of at least 65,000
characters.  Although some people disagreed with my rhetoric,
I termed the coming situation "a disaster for the information
industry."  What resulted from these customer and vendor
concerns was one of the little-known successes in our
industry.

Starting in May, 1991, people from the Unicode Consortium and
the ISO working group met three times to negotiate an
agreement to merge the two codes.  After much give and take on
both sides, they agree on a merger of features from both
Unicode V1.0 and the ISO/IEC draft international standard
10646:1990.  In 1992, the national standards organizations
voted to adopt the second version of the ISO/IEC 10646 draft
international standard (the merged code) as an international
standard.  ISO published the standard as ISO/IEC 10646-1:1993
and it is available from ANSI (the American National Standards
Institute) in New York City (phone 212/642-4900).  Meanwhile,
the Unicode Consortium modified its code so that Unicode V1.1
complies with the 2-byte form of the ISO/IEC 10646-1 standard.
Since then, the ISO working group and the Unicode Consortium
have continued to cooperate for the enhancement of the
standard.  Information about the Unicode Consortium may be
obtained by calling 408/777-5870 or via e-mail from
unicode-inc@unicode.org.

Without the merger, our computers would have wasted countless
cycles converting between Unicode and 10646.

The main point, again, is that although the ISO/IEC 10646-1
standard incorporated most of the Unicode V1.0 features (as
stated in the column), it also included important features of
earlier draft versions of 10646.  One was the ability of 10646
to encode up to 2,000,000 characters (with its 4 byte form).
Although the Unicode code-space (that is the 2-byte form of
10646) comprising 65,000 characters is likely sufficient for
the commercial market, the larger coding space available with
the 4-byte form of 10646 will provide bibliographers and
scholars access to the characters in rare and dead writing
systems.  The Unicode Consortium's latest estimate for the
number of characters to be encoded is around 250,000
characters.  This is a far cry from the 65,000 code-space
limitation of the original Unicode 1.0 version.

2. Mr. Feibus also stated, "Unicode specifies that strings be
stored in their natural order".  That is true, but he
continued with "--for instance, Hebrew from right to left,
Latin languages from left to right."  The continuation is
right in that some writing systems are right to left and
others are left to right, but the _ordering_ of characters in
the computer is always the same.  Unicode and 10646 store
character strings in all languages in the same (natural) order
from the first character in the string to the last one.
However, the rendering process that displays and prints the
character string must decide that strings of Hebrew characters
are rendered from right to left on the screen or on the paper,
and that strings of Latin characters are rendered from left to
right.  In summary, storage of Unicode and 10646 strings is
from first character to last, but the rendering process makes
the right-to-left and left-to-right distinctions based on the
writing system.

These represent my personal views rather than those of SHARE, the
Applied Physics Laboratory or the standards committee.

Edwin Hart
Chairman of the U.S. X3L2 technical standards committee for codes
and character sets
The Johns Hopkins University Applied Physics Laboratory
Laurel, MD  20723-6099
Edwin.Hart@jhuapl.edu
