Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

From: DougEwell2@cs.com
Date: Mon Feb 26 2001 - 11:17:27 EST


In a message dated 2001-02-26 02:01:51 Pacific Standard Time,
marco.cimarosti@essetre.it writes:

> Doug Ewell wrote:
> > A *script* like Latin or Cyrillic typically has many more
> > characters than any one language will ever use.
> > An *alphabet* is, by definition, language-specific.
>
> Hhmmm...
>
> We probably all agree that Chinese, Japanese and Korean share the "CJK
> script".
>
> But would you say, following your definition, that the subset of the CJK
> Script used to write Mandarin in Mainland China should be called "The
> Chinese Simplified *Alphabet*"?
>
> I know that the term "alphabet" is used in a similar manner in information
> theory, but this doesn't sound very fine to me when talking about writing
> systems.

As usual, I have used terminology when thinking and talking about alphabetic
scripts that does not apply to non-alphabetic scripts. The original
discussion was about the Latin script, and I dragged the Cyrillic script into
it. It was not intended to describe non-alphabetic scripts, although the
same situation may apply to them: { alphabet, abjad, syllabary } != script.

I can defend myself (in the slick lawyerly way) by pointing out that in my
statement above, I did not claim that every script has an alphabet as a
subset, only that -- echoing Peter Constable's original point -- an alphabet
is a subset of a script. So is an abjad or syllabary, for that matter. (Is
there an English-language term for the subset of the CJK ideographic script
that is used by a given language, say, Japanese?)

Also note that the "subsets" being discussed can in fact encompass the entire
script. As mentioned, there is no language that uses all the characters in
the Latin or Cyrillic scripts, but I would guess (on a Monday morning before
work) that the Maldivian (Dhihevi) alphabet uses the entire Thaana script.
The same might be true for many of the Indic scripts such as Tamil. I have
forgotten the mathematical term for a subset that contains the same items as
the main set, otherwise I would use it here.

> My current understanding of the two terms is the following:
>
> - "Script" is a generic term meaning a writing system of any kind, its
> inventory of signs and its orthographic rules.
>
> - "Alphabet" is a specific class of scripts, whose principal characteristic
> is that tends to map each sign to one of the language's phonemes. It
opposes
> to, e.g., a "syllabic" script, which maps longer sequences of phonemes
> (often in the form consonant+vowel) and a "logographic" script which maps
> signs to morphemes ("words" or parts of words). Someone subdivides this
> definition of "alphabet" in various classes according to whether all
> phonemes are equally mapped to symbols, or only some of them (e.g. an
> alphabet that privileges consonants over vowels is also called an "abjad",
I
> think).

All of this looks good to me.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT