Re: Word break tests

From: Daniel Ehrenberg (microdan@gmail.com)
Date: Wed Jan 07 2009 - 13:03:06 CST

  • Next message: Mark Davis: "Re: Word break tests"

    I'm sorry, this was an error on my end. Ignore that message.

    On Wed, Jan 7, 2009 at 12:38 PM, Daniel Ehrenberg <microdan@gmail.com> wrote:
    > I'm implementing UAX #29 word breaking (without tailoring). Right now,
    > I've implemented the algorithm except that I treat rules like
    >
    > Numeric (MidNum | MidNumLet) × Numeric
    >
    > as
    >
    > (MidNum | MidNumLet) × Numeric
    >
    > The funny thing is, though, that all unit tests in WordBreakTest.txt
    > pass. But a string like "foo: bar" segments as /foo:/ /bar/. By my
    > reading of the UAX, this is incorrect, and the correct word
    > segmentation would be /foo/:/ /bar/. For my own project, I'll add some
    > additional unit tests, unless I've misread the standard. It seems to
    > me like these tests should be added to the WordBreakTest.txt file, and
    > I'd be glad to supply them. Is this possible?
    >
    > Dan
    >



    This archive was generated by hypermail 2.1.5 : Wed Jan 07 2009 - 13:06:20 CST