Re: Definition of Unicode

From: John Cowan (john_cowan@hotmail.com)
Date: Mon Jul 07 1997 - 11:34:22 EDT


Due to network problems, I can read mail at cowan@ccil.org, but
can't post/reply/send from there. Please direct all replies to
cowan@ccil.org, not the HotMail address. Thanks.

David Brauer wrote:

> Heh folks, not to start an online broiling of the poor guy, but here's
> the definition of Unicode from The Free On-line Dictionary of
Computing.
>
> http://wagner.Princeton.EDU/foldoc/contents.html
>
> Unicode
>
> 1. A subset of ISO 10646, a 16-bit character code intended to cover
all
> of the world's writing systems, including Roman,
> Greek, Cyrillic, Chinese, hiragana, katakana, Devanagari, Easter
Island
> "rongo-rongo", and even Elvish.

Mea maxima culpa. I was the guy who wrote that sentence,
not at all as a definition of Unicode, but as a one-sentence
explanation in another definition altogether.

The Jargon File (http://www.ccil.org/jargon) is a well-known on-line
and paper glossary of hacker slang. Jargon File version 2.0 contained
the IBM term "zigamorph", meaning the EBCDIC non-character FF (used
as a terminator). I proposed to Eric Raymond, the Jargon File editor,
that U+FFFF also be called "zigamorph", due to its analogous status
within Unicode.

At that time (in the pre-Java era), I thought a one-line lighthearted
explanation of what Unicode is about would be useful. I also knew a
whole lot less about the Unicode Standard than I do today.

The current Jargon File version (4.0.0) says, in full:

zigamorph /zig'*-morf/ /n./

1. Hex FF (11111111) when used as a delimiter or fence character.
Usage: primarily at IBM shops. 2. [proposed] /n./ The Unicode
non-character U+FFFF (1111111111111111), a character code which is
not assigned to any character, and so is usable as end-of-string.
Unicode (a subset of ISO 10646) is a 16-bit character code intended
to cover all of the world's writing systems, including Roman,
Greek, Cyrillic, Chinese, hiragana, katakana, Devanagari, Ethiopic,
Thai, Laotian and many other languages (support for elvish is
planned for a future release).

The bracketed "[proposed]" means that this is a usage
invented by somebody associated with the Jargon File, rather
than a term in active (or former) use. I hope to popularize it.

I know that "subset of 10646" is misleading, and I have asked
to have it removed from the next edition, along with the
anachronistic reference to Ethiopic (how long, O Lord,
how long?) and the reference to "languages" instead of "scripts".

-- 
John Cowan						cowan@ccil.org
			e'osai ko sarji la lojban

_______________________________________________________ Get Private Web-Based Email Free http://www.hotmail.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT