Re: Possibilities of future expansion (from Perception etc thread and fictional etc thread)

From: Thomas Chan (
Date: Sun Feb 25 2001 - 13:09:49 EST

On Sun, 25 Feb 2001, William Overington wrote:

> I find the Private Use Areas of great interest and a valuable resource.
> However, use of the private use characters requires agreement between users
> if private use characters are to be used for exchanging information between
> people. Already there is a development of the ConScript registry. This has
> its influence. I am researching a concept that I am hoping to call a
> uniengine that uses a few more than 1024 characters. For research purposes
> I am placing it in the private use area. From the unicode documentation, I
> have decide to place it in the middle, using U+EC00 to U+EFFF as a block and
> placing the additional character codes in U+EB00 to U+EBFF. Yet I checked
> at the ConScript registry to ensure that I was not clashing with that
> research work. If the uniengine concept becomes popular maybe it will
> become encoded by the committees into the standard. I feel that the
> interesting point though is to ask whether, just because there has been
> mention in the unicode list of that range for a particular line of research
> work, notes will be made of the fact in documents here and there amongst
> researchers in the unicode area, so that any possibilities of clashes of
> meaning with some other person's use of a particular code in those ranges is
> noted. The very fact that I felt it desirable to check at the ConScript
> registry is to my mind a demonstration that the private use area is already
> something other than a private use area.

On the other hand, apparently neither CSUR nor you were aware of (or
chose to ignore) the clashes with the mappings between the PUA and legacy
CJK encodings and character sets, which [the mappings] have already been
implemented for 5+ years now on CJK versions of Windows.

For the particular ranges you've chosen (U+EB00 .. U+EFFF), you clash

    9A41-A0FE U+E000-U+E4DD
    AA41-AFFE U+E4DE-U+E909
    F8A1-FEFE U+E90A-U+EDE7

    FA40-FEFE U+E000-U+E310
    8E40-A0FE U+E311-U+EEB7
    8140-8DFE U+EEB8-U+F6B0
    C6A1-C8FE U+F6B1-U+F848

Various groups such as academics, input projects, newspapers, etc use
these ranges; perhaps the most prominent is HKSCS (and its predecessor
GCCS), which had been placed in CP950's user-defined zones, and thus has
a mapping to the PUA exists (and .tte fonts are created by using PUA
codepoints in the cmap). If you chose other ranges, you might clash with
CP932 or CP949's. Similar ranges are used on the Macintosh as well. (I
don't know about other platforms.)

Of course, all of these clash with each other, CSUR, Microsoft's "Symbol",
etc in whole or part. I don't see why there's any particular reason why
anyone should really care about clashes.

Thomas Chan

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT