Re: Questions about Unicode history

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jan 30 2002 - 15:18:35 EST


Marco,

I'll answer as many of your questions as I can, and will
cc this to the unicode list (in part to forestall a gazillion
"Well, I think maybe X" responses).

--Ken

> - When did the Unicode project start, and who started it?

The detailed history for this will soon be available on the
Unicode website. The short answer is that Joe Becker (Xerox) and
Lee Collins (Apple) were highly instrumental in getting the
ball rolling on this, and the preliminary work they did,
primarily on Han unification, dated from 1987.

However, "the Unicode project" had many beginnings -- many points
where you could mark a milestone in its early development. And
the Unicode Consortium celebrated a number of 10-year
anniversaries, starting from 1998 and continuing through last year.

>
> - Is it true Han Unification was the core of Unicode, and the idea of an
> universal encoding come afterwards?

The effort by Xerox and Apple to do a Han unification was key to
the motivation that eventually led to a serious effort to actually
*do* Unicode and then to establish the Unicode Consortium to
standardize and promote it. However, the idea of a universal encoding
predated that considerably. In some respects the Xerox Character Code
Standard (XCCS) was a serious attempt at providing a universal
character encoding (although it did not include a unified Han
encoding, but only Japanese kanji). XCCS 2.0 (1980) contained, in
addition to Japanese kanji: Latin (with IPA), Hiragana, Bopomofo, Katakana,
Greek, Cyrillic, Runic, Gothic, Arabic, Hebrew, Georgian, Armenian,
Devanagari, Hangul jamo, and a wide variety of symbols. The early
Unicoders mined XCCS 2.0 heavily for the early drafts of Unicode 1.0,
and always regarded it as the prototype for a universal encoding.

Additionally, you have to consider that the beginning of the ISO project
for a Multi-octet Universal Character Set (10646) predated the
formal establishment of Unicode. Part of the impetus for the serious
work to standardize Unicode was, of course, discontent with the
then architecture of the early drafts of 10646.

>
> - Who and when invented the name "Unicode"?

This one has a definitive answer: Joe Becker coined the term,
for "unique, universal, and uniform character encoding", in 1987.
First documented use is in December, 1987.

>
> - When did the ISO 10646 project start?

Unfortunately, the document register for early WG2 documents doesn't
have dates for all the early documents, and I don't have all the
early documents to check. But...

The 4th meeting of WG2 was held in London in February, 1986. The
first three meetings were in Geneva, Turin, and London, respectively.
That puts the likely timeframe for the Geneva meeting, and the
establishment of WG2 by SC2 at about 1984. The *only* project for WG2
was 10646.

Some of the older oldtimers on the list may have more exact information
about the early WG2 work.

>
> - When did Unicode and ISO 10646 merge?

It wasn't a single date that can be pointed to, like the signing
of an armistice. In some respects, Unicode and ISO 10646 are *still*
merging, as modifications and amendments to deal with niggling little
architectural edge cases are worked out.

However the key dates were:

January 3, 1991. Incorporation of the Unicode Consortium, which
   signalled to SC2 that the Unicoders were serious in their
   intentions.

May, 1991. Meeting #19 of WG2 in San Francisco. An ad hoc meeting
   took place between WG2 members and some Unicoders, which paved
   the way for the later "merger" of the standards.

June, 1991. The 10646 DIS 1 was defeated in its ballotting. This left
   the only reasonable way forward an architectural compromise with
   the Unicode Standard, which at that point was in copy edit and
   about to go to press.

June 3, 1991. The date of "10646M proposal draft to merge Unicode and
   10646", by Ed Hart. This was a key document in the resulting
   merger of features.

August, 1991. The Geneva WG2 meeting accepted Han unification, combining
   marks, dropped byte-by-byte restrictions on code values for UCS-2,
   and accepted Unicode repertoire additions. From that point forward,
   the overall aspect of what became ISO/IEC 10646-1:1993 was clear.

>
> - What is the name of the GB and JIS standards that have the same repertoire
> as Unicode?

GB 13000 has the same repertoire as ISO/IEC 10646-1:1993.
JIS X 0221 has the same repertoire as ISO/IEC 10646-1:1993.

Those two were effectively national publications of 10646. You can
work out the correlations with Unicode from that.

GB 18030:2000 in principle has the same repertoire (but different
encoding) as ISO/IEC 10646-1:2000, i.e. the same as Unicode 3.0.
(But there were small problems in it.) However, the 4-byte form
of GB 18030 maps all Unicode code points, assigned or not, so
it will (in theory, at least) always have the same repertoire
as Unicode.

>
> - When did Unicode stop to be "16 bits"? (I.e., when were surrogates added?)

In terms of publication, with Unicode 2.0 in 1996. However, the decision
was taken by the UTC considerably before publication.

Amendment 1 to 10646-1 (UTF-16) was proposed to WG2 in WG2 N970, dated
7 February 1994. Mark Davis was the project editor for that amendment.

>
> - I can't remember the version when some scripts were added: Syriac, Thaana,
> Sinhala, Tibetan, Myanmar, Ethiopic, Cherokee, Canadian Syllabics, Ogham,
> Runes, Khmer, Mongolian, Yi, Etruscan, Gothic, Deseret, CJK ext. A, CJK ext.
> B.

See pp. 968-969 of TUS 3.0.

Tibetan was in Unicode 1.0, then was removed. It was readded, in a
new encoding, in Unicode 2.0.

Syriac, Thaana, Sinhala, Myanmar, Ethiopic, Cherokee, Canadian Syllabics,
Ogham, Runic, Khmer, Mongolian, Yi, CJK Extension A were added in
Unicode 3.0.

Old Italic (including Etruscan), Gothic, Deseret, and CJK Extension B
were added in Unicode 3.1.

> - Roughly, how many ideographs are in modern use in extensions A and B?

Not many. I'll refer to the IRG experts to make a guess there.

>
> - Roughly, when will version 3.2 become official?

March, 2002.

>
> - Roughly, when will the version 4 book be published?

Currently still scheduled for March, 2003, but schedule slip is
always a possibility on a major publication project like this.

> I also have a few non-Unicode questions:
>
>
> - When was ASCII first published and by whom?

1967. By ANSI X3.4.

Actually, that was preceded by ASCII per se, the earliest form of
which was published as a standard in 1963 by ASA (American Standards
Association -- the predecessor to ANSI). But the 1963 version of ASCII
had some differences from what we now know as ASCII.

>
> - What standard was current before ASCII? (BAUDOT, is it?) How many bits did
> it use?

I'll let the ancient computer and terminal mavens have at that
one. There is lots of early character encoding history available
on the web -- it's not too hard to find information about it,
actually.

>
> - Did the ASCII standard expire, and when?

No, it is still a standard.

>
> - When was ISO 646 published?

1972.

>
> - I think that ISO 646 expired. When?

No, it is still a standard. The current version is the ISO-646-IRV,
revised in 1991.

>
> - When was ISO 8859 published?

It comes in many parts, each of which has a separate publication date.

>
> - When did the first double-byte encoding appear?

Dunno. Maybe one of the IBMers will know when IBM first started
implementing double-byte Asian character sets.

--Ken



This archive was generated by hypermail 2.1.2 : Wed Jan 30 2002 - 14:58:39 EST