Re: Standartising search for similar symbols

From: Mark Crispin (mrc+unicode@panda.com)
Date: Fri Nov 13 2009 - 13:20:45 CST

  • Next message: sergey: "Re: Standartising search for similar symbols"

    If the text editor uses i;unicode-casemap (RFC 5051) for its search, it
    will find both. This is because U+2212 decomposes to U+002D.

    It is certainly possible to find examples in which i;unicode-casemap won't
    bail you out; it was intended to be a very basic first-level that is
    simple to implemented. But at least in this example, you have what you
    want.

    Best wishes,

    -- Mark --

    On Fri, 13 Nov 2009, sergey wrote:
    > Please imagine that we have big text file. At the beginning of this file
    > someone wrote:
    > 3-2*4
    > The "-" here is U+002D.
    > At the middle of file someone else wrote:
    > 3−2*5
    > The "−" here is U+2212.
    > Now imagine that you see "3-2*4" and want to find all that means "3 minus 2"
    > in the file. You ask you text editor for searching "3-2". It will
    > find only "3-2*4" but not both because "-" and "−" has different
    > codes in Unicode.

    -- Mark --

    http://panda.com/mrc
    Democracy is two wolves and a sheep deciding what to eat for lunch.
    Liberty is a well-armed sheep contesting the vote.



    This archive was generated by hypermail 2.1.5 : Fri Nov 13 2009 - 13:22:45 CST