fictional scripts revisited

From: Thomas Chan (
Date: Thu Feb 22 2001 - 01:52:36 EST

Hi all,

Between January 30-31, there was a thread here entitled "ConScript
registry?", in which I mentioned[1] the possibility of non-Western
fictional scripts gobbling up codepoints, where I gave two example .jpg
files of the kinds of Chinese fictional scripts that exist.

Whether those fictional scripts, which occur in places such as the
multi-volume Taoist Canon (Daozang)[2], are worthy of inclusion or can be
unified with "CJK Ideographs" remains to be seen, but the nature of the
encoding of logographic scripts means that fictional ones created on
such a model would tend to consume much more codepoints than fictional
alphabetic or syllabic scripts.

[1] See for archived

[2] See the many boxes (Chinese-style) of the volumes of "cheng tung tao
tsang" at a library, which is the Zhengtong canon, in the BL1900.A1
section (sorry about the US Library of Congress code).

Probably, someone will tell me that all or most are font variations,
and therefore unifiable. Fine, I accept that assumption for purposes of
this discussion.

However, I would like to raise two examples of fictional scripts and
characters from the work of a contemporary Chinese artist, XU Bing[3], who
now resides in New York. There are some interesting issues.


First, there are the 4000 new[4] "CJK Ideographs" that he created solely
for a work called _Tianshu_ (A Book from the Sky)[5] (1987-1991), which Xu
spent three years carving movable wooden type for. There is no doubt that
these are bona fide Han characters, albeit without readings and meanings.
However, the lack of readings and/or meanings, or nonce usage, has not
stopped characters before from being included in Unicode or precursor
dictionaries and standards, e.g., U+20091 and U+219CC, created as a "I
know these two characters and you don't" one-upmanship stunt; or the
various typos inherited from JIS standards. Of course we as
contemporaries know why Xu created these characters, and can disregard
them, but if Xu had lived centuries ago, there is no doubt that these 4000
characters might have found themselves included in the _Kangxi Zidian_
dictionary (1716) and/or other dictionaries and standards, and ultimately
into Unicode. But consider that these represent potentially 4000
codepoints that could be gobbled up by "fictional characters", and it
only took a a single individual three years to come up with them.

[4] There is the small chance that some might accidently already exist,
but the figure would still remain high.

[5] See http://www.echinaart.coma/Advisor/xubing/adv_xubing_gallery01.htm ,,, etc.

The second example I would like to raise are the "Square Words" or "New
English Calligraphy"[6] (I don't know which name is more appropriate,
but I will refer to it hereafter as "NEC"), which is a Sinoform script.
NEC is a system where each letter of the English alphabet[7] is equated
with one (?) component of Han characters, and each orthographic word is
written within the confines of a square block, in imitation of Chinese
writing (c.f., the arrangement of Korean Hangul letters in a square block
in imitation of Chinese). We already know of the great waste of
codepoints that precomposed Hangul syllables take up in comparison to
using combining Hangul Jamo letters, as well as lack of flexibility in
creating new combinations; however, this is exactly how CJK Ideographs are
encoded in almost all implementations, including Unicode (all ~70,000+ of
them!). Thus, there's no reason to expect that NEC would be encoded any

The practice has been to encode CJK Ideographs as they are attested,
unlike the case of precomposed Hangul syllables, where many are
never used in actuality. At the moment I am not aware of any extensive
writing using NEC (although corpus size doesn't seem to hinder the
inclusion of some other scripts and characters in Unicode), but Xu's
exhibitions do invite visitors to try their hand at NEC, and there is a
published instruction book[8]. Since each orthographic English word
becomes one of these Sinoform "ideographs", there is a possibility of this
single fictional script consuming tens of thosuands of codepoints as CJK
Ideographs do! (Realistically, we should assume a bare minimum of
3000-4000+ for the number of lexemes used in most English writing--I'm not
sure how NEC handles affixes and inflection. And multiply that for each
additional Latin-script language's words!)

At the inception of various other fictional scripts, no one could foresee
the growth of scholarly and/or amateur interest in them; likewise, we have
no way of knowing if NEC may catch on in upcoming decades or centuries.
More so than the 4000 imaginary characters given in the _Tianshu_ example,
this sort of thing is a greater threat--imagine a NEC Sinoform "ideograph"
for each orthographic word in English, Spanish, and French--just to take
three arbitrary languages--and we'll even give them the benefit of
"unification". From what I see now, the only way to handle this sort of
thing would be to throw ever more precious codepoints at it.

[6] See ,
especially the "men" and "women" restroom signs!

[7] Technically, Latin script letters--see the Spanish surnames in the
"Your surname Please" exhibit (1998) in the URL listed in footnote #6.


Thomas Chan

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT