RE: Re[2]: Errata in language/script list

From: Thomas Chan (thomas@atlas.datexx.com)
Date: Mon Aug 13 2001 - 21:20:12 EDT


On Mon, 13 Aug 2001, Ayers, Mike wrote:

> > From: Thomas Chan [mailto:thomas@atlas.datexx.com]
> > No, they do. While the dominant way that Chinese languages
> > are written
> > today, which is based on Mandarin Chinese, has been well
> > supported since
> > pre-Unicode 3.0 days, other Chinese languages have faced the
> > problem of
> > many unencoded (or yet-to-be-encoded) characters. I've
> > written on this
> > matter on this list before in the past, principly about Yue
> > Chinese (=~
> > Cantonese), but also applicable to other Chinese languages.
>
> Since those all will get coded into the Chinese alphabet (if they
> get coded), what's the point?

It's pretty simple. Just because enough of a script is encoded for the
needs of one language doesn't mean that is necessarily true for other
languages that use that script. In time, those omissions are patched up
in newer versions of Unicode. Latin, Cyrillic, Arabic, and other scripts
have all had new characters added to them in sucessive versions of
Unicode.

e.g., If someone asked 1-2 (pre-Unicode 3.1) years ago the question, "Can
I write Cantonese with Unicode?", the answer would have been "no" or "not
really". If it were asked today, the answer would be "yes". But try that
question today with other minority Chinese languages substituted in it,
and the answer is still pretty much a "no" or "not really".

 
> > Some also require different scripts, such as the Dungan living in the
> > former Soviet Union, who write in Cyrillic (I've been told all the
> > characters they need are encoded), or some Min Chinese, who
> > write in whole
> > or part using the characters in the Bopomofo Extended block (Unicode
> > 3.0) and/or Latin (using certain letter and diacritics that
> > weren't always
>
> If you get genuine exceptions, then list them (i.e. list "Min
> Chinese"). I get the feeling that you're talking about a darn small
> userbase here, though.

According to the SIL Ethnologue 14th ed.[1], Dungan (SIL "DNG"):
  38,000 in Kyrgyzstan (1993 Johnstone). Mother tongue speakers were 95%
  out of an ethnic population of 52,000 in the former USSR (1979
  census). Population total all countries 49,400 out of an ethnic
  population of 100,000.

[1] http://www.ethnologue.com/show_language.asp?code=DNG

I don't have figures for the size of the userbase of Min Chinese written
in Latin script offhand, but see for instance "Proposal to add Latin
characters required by Latinized Taiwanese languages to ISO/IEC 10646"[2]
(1997.6.26) under the "user community" questions.

[2] http://www.egt.ie/standards/la/taioan.html
(Did this ever become a WG2 document? I recall seeing discussions of
this once, but can't find them offhand at the moment.)

BTW, what do you consider to be a "darn small userbase", numberwise?
Would the UCAS or Cherokee userbases be too "small" by your standards to
include a mention of them?

> > encoded). There's also the Hunan women who write in the
> > unencoded Nushu
> > script that was discussed on this rather recently.
>
> Discussed well enough for me to know that we're talking about a
> userbase of approximately twelve and counting down. This is not a very
> pressing case.

No, its probably not pressing at the moment.

I'm sure there are more than twelve people who use it for writing and/or
research, though. Start counting with the number of people who write
in it, and add to that figure the researchers and their assistants (i.e.,
their students) who are doing the surveys...

 
> > And this is without going into historical alternative ways of writing
> > Chinese, such as the prolific Guanhua Zimu alphabet/syllabary
> > used in the
> > 1900s-1920s.
>
> ...which we don't really need to do, I think, since we're trying to
> stick to the useful stuff.

What do you consider "useful"? What one person considers "useless" is
"useful" to someone else. Without specific requirements like userbase
size, economic power, cultural significance, extant writings, etc, I don't
think we can start making any claims about usefulness.

The Bible (or portions of it) has been published using
Guanhua Zimu[3]. Is that not "useful" to someone?

[3] From Eugene A. Nida, ed., _Book of a Thousand Tongues_, 2nd ed.
(London: United Bible Societies, 1972):
  http://deall.ohio-state.edu/grads/chan.200/misc/guanhua_zimu.jpg

If you think historical scripts are not useful, then perhaps the four
Phillipine scripts, Ogham, Runic, etc should not be mentioned on the list.

Anyway, I don't see usefulness as one of the requisites for inclusion on
the list in question.

 
> > And then there are various transliteration schemes, which
> > although they
> > are not anyone's primary script, but which are widely
> > employed, such as
> > Hanyu Pinyin (people do ask, as legacy GB2312 and Big5 character sets
> > don't have them, or only include ugly full-width versions)
> > for Mandarin,
> > or Yale for Cantonese (e.g., people ask if a precomposed "m"
> > with a grave
> > accent is encoded, as that is need to transcribe the negative).
>
> Transliteration scripts should be treated like the bastard children
> that they are and accorded no status. Listing them would only cause
> unnecessary confusion.

Its clear to me that you have a very low opinion of minority languages,
scripts, and characters. Whether or not transliteration is beyond the
scope of the list in question is one issue, and I agree that it would open
up the possibility of listing almost every language with almost every
script (or at least, Latin). But what's your rationale for claims like
"bastard children"? (And what is that supposed to mean, anyway?)

Thomas Chan
tc31@cornell.edu



This archive was generated by hypermail 2.1.2 : Mon Aug 13 2001 - 22:20:43 EDT