RE: traditional vs simplified chinese

From: Rick Cameron (
Date: Thu Feb 13 2003 - 14:37:07 EST

  • Next message: Marco Cimarosti: "RE: traditional vs simplified chinese"

    The Win32 API includes a function that can do this folding, on Windows
    NT/2000/XP: LCMapString, with the option LCMAP_SIMPLIFIED_CHINESE or

    I know little about Chinese, but I have the impression that it is much more
    common for several traditional characters to correspond to one simplified
    character than vice versa. If that's true, it seems to me that it would make
    most sense to fold to simplified.

    - rick

    -----Original Message-----
    From: Marco Cimarosti []
    Sent: Thursday, 13 February 2003 11:13
    Subject: RE: traditional vs simplified chinese

    Paul wrote:
    > To: Edward H Trager
    > > Marco Cimarosti has questioned, why do you need to classify
    > > text as being simplified or traditional?
    > if i understand their needs correctly, its to implement a
    > search system with search phrases of either "type" of
    > chinese--content would be in both types.

    Still, I don't see what's the purpose of "classifying" the user input. What
    they really need is rather a special collation algorithm that *ignores* the
    difference between corresponding traditional and simplified characters for
    the purpose of searching. This is somewhat analogous to making a "caseless"

    The easiest way to do it is "folding" both the user's query and the content
    being sought to the same form (either traditional or simplified, it doesn't
    matter). It may also help to "fold" also other kinds of variants beside
    simplified and traditional.

    This "folding" is much easy that implementing a full-fledged
    simplified<->traditional conversion (which needs to be context sensitive and
    dictionary-driven), because the result is just in a temporary buffer used
    for comparison, and no one is going to see it.

    _ Marco

    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 15:16:15 EST