Word break tests

From: Daniel Ehrenberg (microdan@gmail.com)
Date: Wed Jan 07 2009 - 12:38:14 CST

  • Next message: Daniel Ehrenberg: "Re: Word break tests"

    I'm implementing UAX #29 word breaking (without tailoring). Right now,
    I've implemented the algorithm except that I treat rules like

    Numeric (MidNum | MidNumLet) × Numeric


    (MidNum | MidNumLet) × Numeric

    The funny thing is, though, that all unit tests in WordBreakTest.txt
    pass. But a string like "foo: bar" segments as /foo:/ /bar/. By my
    reading of the UAX, this is incorrect, and the correct word
    segmentation would be /foo/:/ /bar/. For my own project, I'll add some
    additional unit tests, unless I've misread the standard. It seems to
    me like these tests should be added to the WordBreakTest.txt file, and
    I'd be glad to supply them. Is this possible?


    This archive was generated by hypermail 2.1.5 : Wed Jan 07 2009 - 12:41:55 CST