RE: About Kana folding

From: Yves Arrouye (yves@realnames.com)
Date: Fri May 18 2001 - 01:42:01 EDT

Next message: Martin Duerst: "Re: UTF-8 signature in web and email"
Previous message: Christopher JS Vance: "Re: [OT] bits and bytes"
Maybe in reply to: Yves Arrouye: "About Kana folding"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Kenneth,

Thanks for the explanations.

> So I'd suggest you be very careful when trying to do this kind of
> a folding. If it is just for surface text matching, the number of
> false positive matches would likely swamp the number of false
> negatives you'd be correcting.
>
> On the other hand, if you are doing a phonetic matching, then
> of course
> you have to fold the Hiragana and Katakana forms together.

I am trying to work around a situation where people cannot register a
database key in Katakana and the same one in Hiragana (because the DB's
collation does some Kana folding), yet they need to be able to find it using
either of these (after this key has been migrated to some other system that
doesn't do Kana folding). I don't know if that's what you call surface text
matching. The matching will be done on the whole key, not using N-grams.

> The more serious problem of equivalencing for matching in Japanese
> would be kanji versus Hiragana, in particular. [...] Getting this kind of
thing
> right is far more important for matching in Japanese than just
> brute matching of Hiragana to Katakana.

And if one wanted to do that automatically (which is not my intent, Kanji
work fine), one would need a dictionary to go from words in Kanji to one
Kana, is that true?

Next message: Martin Duerst: "Re: UTF-8 signature in web and email"
Previous message: Christopher JS Vance: "Re: [OT] bits and bytes"
Maybe in reply to: Yves Arrouye: "About Kana folding"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT