Re: Locale ID's again: simplified vs. traditional

From: Doug Ewell (dewell@compuserve.com)
Date: Tue Oct 03 2000 - 00:15:24 EDT


Yung-Fong "Frank" Tang <ftang@netscape.com> wrote:

> Steven R. Loomis wrote:
>
>> In RFC1766 usage, "zh-tw" is often used to mean traditional chinese,
>> and "zh-cn" is used for simplified This occurs in places such as HTTP
>> headers and xml:lang tags.
>
> No. "zh-tw" only mean Chinese used in Taiwan and "zh-cn" only mean
> Chinese used in China. It happen the people in Taiwan use Traditional
> Chinese and people in China use simplified Chinese. The code "zh-tw"
> itself does not imply traditional Chinese or simplified Chinese.

No, but Steven is right: in RFC 1766 *usage*, those codes are often
*used* to mean those things. He never said the usage was officially
sanctioned in RFC 1766 or anywhere else.

> Also, I think RFC1766
> (http://www.cis.ohio-state.edu/htbin/rfc/rfc1766.htmlhttp://www.cis.ohio-state.edu/htbin/rfc/rfc1766.html)
> itself does not define what "zh-tw" nor "zh-cn" mean. It only define
> how to use ISO 3166 and ISO 639 to define language. Actually, if you
> seach the RFC1766, you won't even find the string "zh-tw" in the RFC.

The relevant quote from RFC 1766 is:

  "In the first subtag... [a]ll 2-letter codes are interpreted as ISO
  3166 alpha-2 country codes denoting the area in which the language is
  used."

The original intent seems to have been to distinguish language variants
as used in different countries. Two classic examples are en-GB vs.
en-US and fr-FR vs. fr-CA. What has happened in the Chinese case is
that, instead of identifying language variants, the country code in the
first subtag is being used to identify script variants (traditional vs.
simplified Chinese), and -TW and -CN respectively are being used as the
canonical codes for these script variants. Obviously the hack breaks
down if, as Frank hypothesized, both Chinas ever decided to use the
same script.

> If you use zh_HK_TW, then some people could interpreted it as "The
> language used by the Hong Kong people who live in the region governed
> by Taiwan" (which never exist).

Steven knows this already. His question was, if the -CN/-TW approach
is flawed, then what is better?

I would like to be able to point to draft ISO standard 15924, "Code for
the representation of names of scripts," together with the proposed
successor to RFC 1766, which would allow the use of 15924 script tags
in language tag identifiers. Unfortunately, I see only one tag for Han
ideographs in the 15924 draft (2000-05-18 revision), which means that
15924 does not address this particular distinction, and Steven and
others will have to continue using -TW and -CN as script codes.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:14 EDT