Re: fictional scripts revisited

From: Kenneth Whistler (
Date: Thu Feb 22 2001 - 21:59:17 EST

Joel Rees responded:

> >
> > Idiosyncratic and personal characters are not encoded in Unicode.
> I find this a fault in UNICODE. When we go through the set algebrae in the
> introductory algebra courses for computer science, it is usually pointed out
> that a set of characters can only be artificially closed. When I consider
> that character sets are inherently open, I have to conclude that the UNICODE
> implementation is at direct conflict with one of the Consortium's primary
> stated goals.

I think you are barking up the wrong tree entirely.

The principle of not encoding idiosyncratic and personal characters
in the Unicode Standard has to do with their lack of *usefulness* and
the lack of a need to *standardize* such things. The Unicode Consortium
is not in the business of scouring the world for every individual who
wants to put pen to paper to create something unique that nobody else
has seen before. The purpose of a character encoding standard is to
promote standard interchange of character data, and that implies the
need of *groups* of people to interchange commonly shared tokens for
representation of data. Almost by definition that rules out idiosyncratic
and personal characters.

That is a completely different issue from the open or closed repertoire
status of Unicode. Among all character encoding standards that have
ever existed, the Unicode Standard is the one most explicitly wedded to
having an open repertoire, because the goal of the standard is the
universal inclusion of all scripts, modern and historic, and all
symbolic character sets needed for textual interchange.

> If the history of the ASCII set does not show this plainly enough, then the
> reality of the so-called Han characters (such as new characters being
> invented every year for various technical purposes) should bring the issue
> into better focus.

And in case you haven't noticed, the Unicode Standard keeps adding
Han characters, including 42,711 just this year!

If a personal name Han character neologism comes into general usage, so
that there is some information processing requirement for it, then it
is quite likely that it will have graduated from the "personal character"
status to public usage, and thus be eligible for standardization in
the future.

> If UNICODE can never attempt to address the issue of non-closure,

You've got it completely backwards. The Unicode Standard is the one
with the open repertoire, which is why it keeps expanding year to year.

> it will be
> superceded. That's no big deal, most standards are superceded eventually,
> and the research that is being done to build the UNICODE standard now is a
> necessary step. But if the UNICODE consortium can be flexible enough to
> start preparing to tackle the non-closure issues, the jump to the next
> standard will be a lot easier, and can be postponed a lot longer.

No doubt the Unicode Standard will be superseded some day -- perhaps
when the accumulated legacy interworking cruft in the standard for
old character encoding sets ages to the point where nobody cares about
it anymore.

But it is unlikely to happen in my lifetime. Unicode is starting to
show the venerability and ubiquity that is the hallmark of ASCII's
longevity. It is likely to be around for quite awhile.


> Joel Rees, Media Fusion
> Amagasaki, Japan

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT