Re: Furigana

From: Dan Kogai (dankogai@dan.co.jp)
Date: Thu Aug 08 2002 - 06:59:23 EDT


On Thursday, August 8, 2002, at 04:17 , Michael Everson wrote:
> Where do I start looking for information about implementing furigana?
> Can you have more than one gloss attached to a word? We are considering
> implementing this for Blissymbols.

What do you mean by "implementing"? Or to what extent do you want
furigana implemented?

FYI for those who has no idea what "furigana" means, here is the
definition of furigana; furigana is a kana (Japanese phonetic
characters) you put (on top of/aside) Kanji (ideographic characters) to
show the pronunciation of thereof. Furigana is invented because unlike
Chinese and Korean which character has only one pronunciation (with a
few exceptions, of course), most Kanji in Japan has multiple way to
pronounce. Furiganas are commonly seen in books and mangas for kids but
creative use of Furiganas in literatures are also common (i.e.
"Govenment" in Kanji and "Leech" in Furigana).

Anyway.... You mean "implementing" as;

- Rendering Furigana ?

Many Japanese word processors already have that capability. HTML4 has
<ruby> tag exactly for that purpose.

- Auto-furiganize given text?

That one is way harder (and theoretically impossible to do so
perfectly). To do so, you have to

-- Tokenize the text and extract Kanji parts(though for creative use
furiganized words don't have to be in Kanji. But here we are talking
about automation so we just ignore nits). There are several
tokenization engines available, some even open source like Kakasi
(http://kakasi.namazu.org/).
-- Find kana representation of given Kanji. That one is harder than you
think might think because kana is "context driven". Simple dictionary
lookup won't cut it.

Anyway, it's fair to say furigana handling is a big issue in Japanese
text handling. Good luck.

Dan the Man with Too Many Kanjis Unable to Read



This archive was generated by hypermail 2.1.2 : Thu Aug 08 2002 - 05:07:07 EDT