Re: Word break tests

From: Mark Davis (mark.edward.davis@gmail.com)
Date: Wed Jan 07 2009 - 13:48:11 CST

  • Next message: Markus Scherer: "Re: Emoji: chart updated with font glyph images"

    Even if it was, if you have good additional test cases, we'd welcome them.

    Mark

    On Wed, Jan 7, 2009 at 11:03, Daniel Ehrenberg <microdan@gmail.com> wrote:

    > I'm sorry, this was an error on my end. Ignore that message.
    >
    > On Wed, Jan 7, 2009 at 12:38 PM, Daniel Ehrenberg <microdan@gmail.com>
    > wrote:
    > > I'm implementing UAX #29 word breaking (without tailoring). Right now,
    > > I've implemented the algorithm except that I treat rules like
    > >
    > > Numeric (MidNum | MidNumLet) × Numeric
    > >
    > > as
    > >
    > > (MidNum | MidNumLet) × Numeric
    > >
    > > The funny thing is, though, that all unit tests in WordBreakTest.txt
    > > pass. But a string like "foo: bar" segments as /foo:/ /bar/. By my
    > > reading of the UAX, this is incorrect, and the correct word
    > > segmentation would be /foo/:/ /bar/. For my own project, I'll add some
    > > additional unit tests, unless I've misread the standard. It seems to
    > > me like these tests should be added to the WordBreakTest.txt file, and
    > > I'd be glad to supply them. Is this possible?
    > >
    > > Dan
    > >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Wed Jan 07 2009 - 13:50:23 CST