Re: About Kana folding

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu May 17 2001 - 17:26:16 EDT


Yves asked:

> If one were to need to pick Katakana versus Hiragana and fold one into the
> other (say to let people match a word or sentence in any of them), is there
> one that is preferrable to the other? I think that some Katakana have no
> Hiragana equivalents, does that mean that it's always easier to go from
> Hiragana to Katakana?

I would recommend folding to Katakana if you want to do any folding at
all. Katakana is more widely used for sound symbolism and has phonetic
extensions for non-Japanese sounds, so the set of Katakana combinations
is a bit larger than that of Hiragana.

> Also, what are the caveats of doing such foldings (and
> is it possible to change meanings?)

Well, yes, it is possible to change meanings. The spelling in one or the
other follows conventional rules and tends not to be variable. Thus
most things spelled in Hiragana are not properly spelled in Katakana
and vice versa. In this sense, the usage of Hiragana and Katakana are
rather different than lowercase and uppercase. While capitalization
of certain words in Latin is part of their conventional representation,
turning an entire word into uppercase is not a misspelling of the word.

In Japanese,

<kata>yunikoodo</kata> <hira>ga</hira>, <kanji>ima</kanji>

   <hira>sugu</hira> <kanji>tsuka</kanji><hira>-eru</hira>

if you switched the yunikoodo to Hiragana or the ga, sugu, or -eru
to Katakana, those would be misspellings. And while it would be a
little difficult to construct well-formed examples where switching from
Hiragana to Katakana or vice versa would produce a nice minimal pair
with totally different (but valid) meanings, for isolated matches
it is not hard at all to find distinctions. Just off the top of my
head:

   <kata>gaaru</kata> == 'girl'

   X <hira>ga aru</hira> == 'there is X'

   Hence: <kata>gaaru</kata> <hira>ga aru</hira> == 'There is a girl.'

The space for the word boundary isn't written, and while the Katakana
is written with a length mark: 30AC 30FC 30EB and the Hiragana is written
with a repeated vowel: 304C 3042 038B, a folding would presumably collapse
these together. There is also a disambiguating accent difference in the
spoken forms, but that also is not written.

There are some instances where some words or phrases are "headlined"
into Katakana for effect, but that is rather unusual.

So I'd suggest you be very careful when trying to do this kind of
a folding. If it is just for surface text matching, the number of
false positive matches would likely swamp the number of false
negatives you'd be correcting.

On the other hand, if you are doing a phonetic matching, then of course
you have to fold the Hiragana and Katakana forms together.

The more serious problem of equivalencing for matching in Japanese
would be kanji versus Hiragana, in particular. Verbs, for example,
are commonly written with a kanji plus a string of "okurigana" for
the verb ending. (See "tsukaeru" in the example above.) But the same
sequence could also just be written entirely in Hiragana. There are
some quasi-standards in this, but it often depends on the level of
reading audience you are aiming at. "Hard" kanji that many people
might not know how to read will often just be substituted out to
Hiragana in general purpose publications, school books, and the like.
And for some verbs there is variation regarding the placement of
a stem syllable -- whether it is subsumed within the reading of the kanji
or tacked on as part of the okurigana. Getting this kind of thing
right is far more important for matching in Japanese than just
brute matching of Hiragana to Katakana.

--Ken



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT