Re: Original Aim of Unicode

Date: Sun May 29 2011 - 21:30:38 CDT

On 5/29/2011 6:36 PM, Richard Wordingham wrote:
> On Mon, 30 May 2011 01:35:50 +0100
> Michael Everson<> wrote:
>> Richard Wordingham wrote:
>>> - there was a time when Unicode was not intended to ultimately
>>> encode every script.
>> This is not true. The Universal Character Set is intended to be
>> universal.
> I said 'Unicode', not the 'Universal Character Set'. As evidence, see
> Section 2.1 of 'Unicode 88', available at
> -
> "_Distinction of 'modern-use' characters:_ Unicode gives higher
> priority to ensuring utility for the future that to preserving past
> antiquities. Unicode aims in the first instance at the characters
> published in modern text (e.g. in the union of all newspapers and
> magazines printed in the world in 1988), whose number is undoubtedly
> far below 2^14 = 16,384. Beyond these modern-use characters, all
> others may be defined to be obsolete or rare these are better
> candidates for private-use registration than for congesting the public
> list of generally-useful Unicodes."
> This, of course, is now merely history.
That is indeed a quote from the earliest manifesto using the name
Unicode, and which could be seen as the earliest "project proposal" for
this universal character set.

However, the statements in this lone early paper are a far cry from the
scope of the enterprise once it turned into an actual project. Already
the first version of Unicode (1.0) states that

    * the limitation on modern use is primarily based on the
      "difficulties of evaluating [the] contents" of scripts not in wide
    * "less common and archaic scripts will be added to future
      versions", explicitly mentioning "Egyptian Hieroglyphics" (page 4)

One of the key insights of the Unicode project (and something in which
it goes beyond the earliest concepts) was the realization that a
universal character set needs to be universal not just with regards to
different modern linguistic groups, but also with regards to scholarly
and technical activities by members of these groups. In other words, if
all that was needed was to have digital replicas of ancient text, images
might have sufficed.

What the creators of Unicode realized was the extent to which there were
modern communities that needed to represent these ancient writings on
the same footing as modern texts so that they could write about them
and/or analyze them. The same goes for technical and scientific (and
musical) notation. in all cases, it is the need of modern users to have
all textual materials that they work with and want to interchange
covered by the universal character encoding.

The details of what this entails are occasionally still debated (see the
Emoji discussion not too long ago, or the revision of the treatment of
Braille symbols) but the general conclusion had been arrived at before
the publication of Unicode 1.0.

In other words, if there was ever a time that Unicode was not intended
to cover historic characters, it was clearly before Unicode had become
an actual character set and program of work. Because of that, there's
never been an attempt to keep these characters out of the repertoire -
other than insisting that they be well attested. As it was put in
Unicode 1.0: "For many of these scripts, extensive research will bee
necessary to produce and agreed-upon encoding", something that wasn't as
much of a problem for the core repertoire of modern-use characters -
most of these characters had already been encoded in legacy character
sets, which in many cases made them de-facto characters. Unicode only
needed to ratify existing character encoding approaches for these scripts.

Once this initial repertoire of legacy encodings was exhausted, all
characters and scripts to be added had to researched and analyzed in
greater detail, leading to the currently practiced process of character


PS: nowhere in Unicode will you find the part of your initial statement
above that has been dropped from this exchange - i.e. any criteria that
character must be "commercially significant". The only similar
requirement (and that also goes back to Unicode 1.0) is that characters
must be in "general use." If characters are used only by a single person
or an isolated group, then there is no reason to standardize them.

