Re: Subset of Unicode to represent Japanese Kanji?

From: Kenneth Whistler (
Date: Fri Jul 14 2000 - 15:03:25 EDT

John Cowan asked:

> wrote:
> > The problem with all kana (or all Roman ch) document is because there are so
> > many words with same pronounciations. For example, the Roman Characters "KAMI"
> > may mean God, or hair, or paper, or above. "HASHI" may mean bridge or chop
> > sticks. If it is written in kanji, all God, hair, paper, above, bridge, chop
> > sticks are represented in different kanjis, thus no ambiguity.
> Which is to say, that if a typical Japanese document is read aloud, it is
> a mass of ambiguity, and nobody has any idea what it says? I know this was
> true for Classical Chinese documents, but is it really true for modern
> Japanese ones?

It is not as bad as that. First of all, Foster neglected to mention that
there are pitch accent differences in some of the homonyms that to the
native Japanese ear help make lexical distinctions in a way similar to
stress placement for English. Pitch accents are not, however, written in
Japanese, either with kanji, kana, or romaji -- partly because they tend
to differ in placement depending on what particular local dialect of
Japanese you speak.

But more importantly, Japanese, like almost any other language, has massive
redundancy in context that eliminates the ambiguity in most instances.
Thus no one is going to confuse the verb ending -te with the noun te meaning
"hand", any more than English speakers mix up "to", "too", and "two" or
"their" and "there" when they hear them used in context.

The more important verbal ambiguities tend to come in the high-falutin
Han-derived kanji vocabulary, when indeed there can be confusions about
which of the 10 or 15 or ?? different "kookoo" lexical items are in
question in a particular instance. However, even there, most will be
evident from context, and such compounds often tend to get embedded in
even larger compounds that are less ambiguous.

But in Japanese verbal discourse, when people get confused, they will
often resort to references to Kanji "spelling" of the ambiguous words,
"koo as in xxx and koo as in xxx" -- often writing the kanji in the air
while they do the explanation, so their interlocutor can "see" how it
is written.

> If not, then an all-romaji or all-kana representation cannot be *logically*
> insufficient; however, it is enough that people are not accustomed to it.

Written forms of language lose many of the prosodic cues that people
use in disambiguating speech. In Japanese, if you *also* lose the
morphemic identity clues given by kanji, you end up with stuff that
often has to be puzzled out by the reader. Don't forget that rapid
reading is often accomplished by gestalt-like processes that grab
entire chunks. Not writing the appropriate level of kanji destroys the
word gestalts that most Japanese use to pick the content carriers
(verbs, nouns and such) from the surrounding grammatical framework written
mostly in Hiragana. That significantly reduces legibility of text.


