Disclaimer: I am not Chinese.
Mark Crispin wrote:
> I know that Cantonese is written differently than Mandarin, but I'm not sure
> how many of the other dialects have unique written languages.
It's not a matter of unique written languages, it's a matter of
a small number of language-specific non-standard characters.
These come up mostly in places like transcriptions of song lyrics:
most written texts are in Standard Chinese (Mandarin) only.
> It may also depend upon the script. For example, maybe a Chinese dialect is
> identical to Mandarin using hanzi, but is different in romanization.
> Certainly "zh-cn" vs. "zh-tw" is commonly used to distinguish between
> simplified and traditional; an obvious misuse but a convenient one. I guess
> though that with Unicode, this crutch isn't needed since I doubt that there's
> much (if any) ambiguity between simplified and traditional in Unicode text.
There is no unification of simplified characters with traditional ones.
Of course, if a character has no simplified form, it has only one Unicode codepoint.
> Is it the case that this document, in the absence of language tags,
> can be converted to Unicode unambiguously and display identically?
> Can the
> document, in the absense of language tags, be converted back to ISO-2022-CN
> and still display identically (I'm not saying that file would be the same)?
Schlingt dreifach einen Kreis vom dies! || John Cowan <email@example.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT