Re: Questions about Unicode history

From: Mark Davis (mark@macchiato.com)
Date: Thu Jan 31 2002 - 10:08:30 EST


For when particular characters were added to Unicode, you can also
consult the new DerivedAge.txt, currently in the BETA at:

http://www.unicode.org/Public/BETA/Unicode3.2/DerivedAge-3.2.0d2.txt

Mark
—————

Πόλλ’ ἠπίστατο ἔργα, κακῶς δ’ ἠπίστατο πάντα — Ὁμήρου Μαργίτῃ
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]

http://www.macchiato.com

----- Original Message -----
From: "Kenneth Whistler" <kenw@sybase.com>
To: <marco.cimarosti@essetre.it>
Cc: <unicode@unicode.org>; <kenw@sybase.com>
Sent: Wednesday, January 30, 2002 12:18
Subject: Re: Questions about Unicode history

> Marco,
>
> I'll answer as many of your questions as I can, and will
> cc this to the unicode list (in part to forestall a gazillion
> "Well, I think maybe X" responses).
>
> --Ken
>
> > - When did the Unicode project start, and who started it?
>
> The detailed history for this will soon be available on the
> Unicode website. The short answer is that Joe Becker (Xerox) and
> Lee Collins (Apple) were highly instrumental in getting the
> ball rolling on this, and the preliminary work they did,
> primarily on Han unification, dated from 1987.
>
> However, "the Unicode project" had many beginnings -- many points
> where you could mark a milestone in its early development. And
> the Unicode Consortium celebrated a number of 10-year
> anniversaries, starting from 1998 and continuing through last year.
>
> >
> > - Is it true Han Unification was the core of Unicode, and the idea
of an
> > universal encoding come afterwards?
>
> The effort by Xerox and Apple to do a Han unification was key to
> the motivation that eventually led to a serious effort to actually
> *do* Unicode and then to establish the Unicode Consortium to
> standardize and promote it. However, the idea of a universal
encoding
> predated that considerably. In some respects the Xerox Character
Code
> Standard (XCCS) was a serious attempt at providing a universal
> character encoding (although it did not include a unified Han
> encoding, but only Japanese kanji). XCCS 2.0 (1980) contained, in
> addition to Japanese kanji: Latin (with IPA), Hiragana, Bopomofo,
Katakana,
> Greek, Cyrillic, Runic, Gothic, Arabic, Hebrew, Georgian, Armenian,
> Devanagari, Hangul jamo, and a wide variety of symbols. The early
> Unicoders mined XCCS 2.0 heavily for the early drafts of Unicode
1.0,
> and always regarded it as the prototype for a universal encoding.
>
> Additionally, you have to consider that the beginning of the ISO
project
> for a Multi-octet Universal Character Set (10646) predated the
> formal establishment of Unicode. Part of the impetus for the serious
> work to standardize Unicode was, of course, discontent with the
> then architecture of the early drafts of 10646.
>
> >
> > - Who and when invented the name "Unicode"?
>
> This one has a definitive answer: Joe Becker coined the term,
> for "unique, universal, and uniform character encoding", in 1987.
> First documented use is in December, 1987.
>
> >
> > - When did the ISO 10646 project start?
>
> Unfortunately, the document register for early WG2 documents doesn't
> have dates for all the early documents, and I don't have all the
> early documents to check. But...
>
> The 4th meeting of WG2 was held in London in February, 1986. The
> first three meetings were in Geneva, Turin, and London,
respectively.
> That puts the likely timeframe for the Geneva meeting, and the
> establishment of WG2 by SC2 at about 1984. The *only* project for
WG2
> was 10646.
>
> Some of the older oldtimers on the list may have more exact
information
> about the early WG2 work.
>
> >
> > - When did Unicode and ISO 10646 merge?
>
> It wasn't a single date that can be pointed to, like the signing
> of an armistice. In some respects, Unicode and ISO 10646 are *still*
> merging, as modifications and amendments to deal with niggling
little
> architectural edge cases are worked out.
>
> However the key dates were:
>
> January 3, 1991. Incorporation of the Unicode Consortium, which
> signalled to SC2 that the Unicoders were serious in their
> intentions.
>
> May, 1991. Meeting #19 of WG2 in San Francisco. An ad hoc meeting
> took place between WG2 members and some Unicoders, which paved
> the way for the later "merger" of the standards.
>
> June, 1991. The 10646 DIS 1 was defeated in its ballotting. This
left
> the only reasonable way forward an architectural compromise with
> the Unicode Standard, which at that point was in copy edit and
> about to go to press.
>
> June 3, 1991. The date of "10646M proposal draft to merge Unicode
and
> 10646", by Ed Hart. This was a key document in the resulting
> merger of features.
>
> August, 1991. The Geneva WG2 meeting accepted Han unification,
combining
> marks, dropped byte-by-byte restrictions on code values for
UCS-2,
> and accepted Unicode repertoire additions. From that point
forward,
> the overall aspect of what became ISO/IEC 10646-1:1993 was clear.
>
> >
> > - What is the name of the GB and JIS standards that have the same
repertoire
> > as Unicode?
>
> GB 13000 has the same repertoire as ISO/IEC 10646-1:1993.
> JIS X 0221 has the same repertoire as ISO/IEC 10646-1:1993.
>
> Those two were effectively national publications of 10646. You can
> work out the correlations with Unicode from that.
>
> GB 18030:2000 in principle has the same repertoire (but different
> encoding) as ISO/IEC 10646-1:2000, i.e. the same as Unicode 3.0.
> (But there were small problems in it.) However, the 4-byte form
> of GB 18030 maps all Unicode code points, assigned or not, so
> it will (in theory, at least) always have the same repertoire
> as Unicode.
>
> >
> > - When did Unicode stop to be "16 bits"? (I.e., when were
surrogates added?)
>
> In terms of publication, with Unicode 2.0 in 1996. However, the
decision
> was taken by the UTC considerably before publication.
>
> Amendment 1 to 10646-1 (UTF-16) was proposed to WG2 in WG2 N970,
dated
> 7 February 1994. Mark Davis was the project editor for that
amendment.
>
> >
> > - I can't remember the version when some scripts were added:
Syriac, Thaana,
> > Sinhala, Tibetan, Myanmar, Ethiopic, Cherokee, Canadian Syllabics,
Ogham,
> > Runes, Khmer, Mongolian, Yi, Etruscan, Gothic, Deseret, CJK ext.
A, CJK ext.
> > B.
>
> See pp. 968-969 of TUS 3.0.
>
> Tibetan was in Unicode 1.0, then was removed. It was readded, in a
> new encoding, in Unicode 2.0.
>
> Syriac, Thaana, Sinhala, Myanmar, Ethiopic, Cherokee, Canadian
Syllabics,
> Ogham, Runic, Khmer, Mongolian, Yi, CJK Extension A were added in
> Unicode 3.0.
>
> Old Italic (including Etruscan), Gothic, Deseret, and CJK Extension
B
> were added in Unicode 3.1.
>
> > - Roughly, how many ideographs are in modern use in extensions A
and B?
>
> Not many. I'll refer to the IRG experts to make a guess there.
>
> >
> > - Roughly, when will version 3.2 become official?
>
> March, 2002.
>
> >
> > - Roughly, when will the version 4 book be published?
>
> Currently still scheduled for March, 2003, but schedule slip is
> always a possibility on a major publication project like this.
>
> > I also have a few non-Unicode questions:
> >
> >
> > - When was ASCII first published and by whom?
>
> 1967. By ANSI X3.4.
>
> Actually, that was preceded by ASCII per se, the earliest form of
> which was published as a standard in 1963 by ASA (American Standards
> Association -- the predecessor to ANSI). But the 1963 version of
ASCII
> had some differences from what we now know as ASCII.
>
> >
> > - What standard was current before ASCII? (BAUDOT, is it?) How
many bits did
> > it use?
>
> I'll let the ancient computer and terminal mavens have at that
> one. There is lots of early character encoding history available
> on the web -- it's not too hard to find information about it,
> actually.
>
> >
> > - Did the ASCII standard expire, and when?
>
> No, it is still a standard.
>
> >
> > - When was ISO 646 published?
>
> 1972.
>
> >
> > - I think that ISO 646 expired. When?
>
> No, it is still a standard. The current version is the ISO-646-IRV,
> revised in 1991.
>
> >
> > - When was ISO 8859 published?
>
> It comes in many parts, each of which has a separate publication
date.
>
> >
> > - When did the first double-byte encoding appear?
>
> Dunno. Maybe one of the IBMers will know when IBM first started
> implementing double-byte Asian character sets.
>
> --Ken
>
>



This archive was generated by hypermail 2.1.2 : Thu Jan 31 2002 - 09:59:05 EST