Re: traditional vs simplified chinese

From: David Oftedal (david@start.no)
Date: Thu Feb 13 2003 - 17:19:20 EST

  • Next message: Michael Everson: "Everson Mono"

    I say live with it.

    This happens in Japanese as well, and it gets even worse when searching
    in romazi, European letters, because there are so many different ways of
    spelling things, and all the Chinese borrow words mean and sound exactly
    the same.

    But when the whole point of the system is to search for the meaning of a
    text and not the exact spelling, we have to live with getting a few
    irrelevant results.

    -Dave Oftedal

    Edward H Trager wrote:

    >On Thu, 13 Feb 2003, Rick Cameron wrote:
    >
    >
    >
    >>The Win32 API includes a function that can do this folding, on Windows
    >>NT/2000/XP: LCMapString, with the option LCMAP_SIMPLIFIED_CHINESE or
    >>LCMAP_TRADITIONAL_CHINESE.
    >>
    >>I know little about Chinese, but I have the impression that it is much more
    >>common for several traditional characters to correspond to one simplified
    >>character than vice versa. If that's true, it seems to me that it would make
    >>most sense to fold to simplified.
    >>
    >>- rick
    >>
    >>
    >
    >Hmmm ... Suppose I'm searching for some relatively obscure traditional
    >character that occurs mostly in Wen Yen (u+6587 u+8A00 : Classical
    >Chinese) and has a very specific meaning in Classical Chinese. This
    >character gets "folded" or "mapped" to a fairly common character in modern
    >bai hua (u+767D u+8BDD) Chinese, and then the search proceeds. The result
    >set contains hundreds or thousands of irrelevant results related to the
    >modern meaning, and I still have to sift through them looking for the
    >needles in the haystack. I'll try to provide a concrete example once I
    >think of one ... it's been a long time since I studied Classical Chinese.
    >
    >
    >
    >>This "folding" is much easy that implementing a full-fledged
    >>simplified<->traditional conversion (which needs to be context sensitive and
    >>dictionary-driven), because the result is just in a temporary buffer used
    >>for comparison, and no one is going to see it.
    >>
    >>_ Marco
    >>
    >>
    >>
    >
    >
    >
    >

    -- 
    New Norwegian (Nynorsk) is essentially the speech of Norwegian peasants
    as mutilated by a schoolteacher with a poor understanding of Icelandic.
    --Halldór Laxness, via B. Philip Jonsson
    Swedish, Norwegian and Danish are actually the same language. It's just
    that the Norwegians can't spell it, and the Danes can't pronounce it.
    --Chlewey
    


    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 18:02:57 EST