Re: fictional scripts revisited

From: David Starner (
Date: Thu Feb 22 2001 - 13:40:32 EST

On Wed, Feb 21, 2001 at 10:58:06PM -0800, Thomas Chan wrote:
> First, there are the 4000 new[4] "CJK Ideographs" that he created solely
> for a work called _Tianshu_ (A Book from the Sky)[5] (1987-1991), which Xu
> spent three years carving movable wooden type for. There is no doubt that
> these are bona fide Han characters, albeit without readings and meanings.

Idiosyncratic and personal characters are not encoded in Unicode.

> However, the lack of readings and/or meanings, or nonce usage, has not
> stopped characters before from being included in Unicode or precursor
> dictionaries and standards, e.g., U+20091 and U+219CC, created as a "I
> know these two characters and you don't" one-upmanship stunt; or the
> various typos inherited from JIS standards.

I believe Unicode, as a general rule, does not encode meaningless
characters. Any currently in Unicode are either mistakes, or come from
preexisting standards.

> But consider that these represent potentially 4000
> codepoints that could be gobbled up by "fictional characters", and it
> only took a a single individual three years to come up with them.

But that's not true. No one is proposing that every newly created script
that comes along be encoded in Unicode. For them to gobble up 4000
codepoints, it would take a body of work by a number of authors, like
Tengwar and Cirth have had.

> The second example I would like to raise are the "Square Words" or "New
> English Calligraphy"[6] (I don't know which name is more appropriate,
> but I will refer to it hereafter as "NEC"), which is a Sinoform script.
> NEC is a system where each letter of the English alphabet[7] is equated
> with one (?) component of Han characters, and each orthographic word is
> written within the confines of a square block, in imitation of Chinese
> writing [... CJK ideographs are precomposed in Unicode ...]
> Thus, there's no reason to expect that NEC would be encoded any
> differently.

I disagree. Say, for instance, some small* country decided to adopt NEC
as a writing style, and hence Unicode had to include it. There are
1,000,000 words in English by some counts, so it's not feasible to
encode them all in Unicode, or even some semi-complete subset. So
it would be encoded by component and treated like any other complex
script. (* I say a small country, because a large country might be
able to get a large chunk of precomposed characters stored in Unicode.
I still don't think that it would be done soley precomposed.)

> At the inception of various other fictional scripts, no one could foresee
> the growth of scholarly and/or amateur interest in them;

True. That's why we wait until there is, before we consider encoding
a script.

David Starner -
Pointless website:
"I don't care if Bill personally has my name and reads my email and 
laughs at me. In fact, I'd be rather honored." - Joseph_Greg

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT