Kenneth Whistler explained:
> Joel Rees responded:
> > >
> > > Idiosyncratic and personal characters are not encoded in Unicode.
> > I find this a fault in UNICODE. When we go through the set algebrae in
> > introductory algebra courses for computer science, it is usually pointed
> > that a set of characters can only be artificially closed. When I
> > that character sets are inherently open, I have to conclude that the
> > implementation is at direct conflict with one of the Consortium's
> > stated goals.
> I think you are barking up the wrong tree entirely.
> The principle of not encoding idiosyncratic and personal characters
> in the Unicode Standard has to do with their lack of *usefulness* and
> the lack of a need to *standardize* such things. The Unicode Consortium
Actually, what I _think_ I am after is a standard way to let the standard
committee(s) escape having to deal with ideosyncratic and personal
characters until they actually come into general use.
> is not in the business of scouring the world for every individual who
> wants to put pen to paper to create something unique that nobody else
> has seen before. The purpose of a character encoding standard is to
> promote standard interchange of character data, and that implies the
> need of *groups* of people to interchange commonly shared tokens for
> representation of data. Almost by definition that rules out idiosyncratic
> and personal characters.
> That is a completely different issue from the open or closed repertoire
> status of Unicode. Among all character encoding standards that have
> ever existed, the Unicode Standard is the one most explicitly wedded to
> having an open repertoire, because the goal of the standard is the
> universal inclusion of all scripts, modern and historic, and all
> symbolic character sets needed for textual interchange.
> > If the history of the ASCII set does not show this plainly enough, then
> > reality of the so-called Han characters (such as new characters being
> > invented every year for various technical purposes) should bring the
> > into better focus.
> And in case you haven't noticed, the Unicode Standard keeps adding
> Han characters, including 42,711 just this year!
And I'm not sure it was such a good idea. But I'm not a hardworking member
of the consortium, just a nut with an axe to grind.
(I guess that cliched idiom is kind of dangerous since the proliferation of
a certain class of horror movie. Strictly metaphorical.)
I apologize for being obnoxious here, Ken, I realize you have personally put
a lot into getting that block into the standard. The noise I am generating
here is rather late in the game, but I really want to push the idea that the
international standard should be a standard of commonality, rather than
> If a personal name Han character neologism comes into general usage, so
> that there is some information processing requirement for it, then it
> is quite likely that it will have graduated from the "personal character"
> status to public usage, and thus be eligible for standardization in
> the future.
> > If UNICODE can never attempt to address the issue of non-closure,
> You've got it completely backwards. The Unicode Standard is the one
> with the open repertoire, which is why it keeps expanding year to year.
I know you can't foresee breaking past 17 planes, but remember that ten
years ago it was still a little hard to imagine GB hard drives being on
everyone's desktops. Shoot. My freshman year we were glad to get a little
piece of the college's brand-new 100M drive for our basic programs. I balked
at the idea of needing more than 16 bits to address both program and data.
I'm telling you that 17 planes is not enough, and it _will_ become a painful
constraint in your lifetime.
Maybe I'm a crackpot, but the need is there and people will use and abuse
UNICODE in ways that you probably don't want to imagine. What I'm trying to
push is building the mechanism now for dodging most of the abuse.
[ clipped ]
Joel Rees, Media Fusion
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT