Re: UAX 29

From: Mark Davis (
Date: Wed Aug 22 2007 - 11:47:02 CDT

  • Next message: Anto'nio Martins-Tuva'lkin: "Kludging the palochka"

    In ICU these are called BreakIterators; you can see more about them at, and find out there
    about the rule format for building the tables if you want to customize them.
    If you have more questions, you can pose them to the ICU list: see


    On 8/22/07, Daniel Ehrenberg <> wrote:
    > Hi,
    > I'm reading UAX 29 in order to implement grapheme boundaries (and
    > later word and sentence boundaries) for a Unicode library for the
    > Factor programming language. So far, for grapheme boundary detection,
    > I have a basically direct implementation of the conditions listed for
    > boundaries, where I iterate through the string, checking each
    > connectedness condition, and if they all fail, returning a grapheme
    > break. This implementation works, but I'm wondering about a
    > table-based implementation, which could be faster and allow tailoring
    > (my implementation doesn't really allow that, except for rewriting
    > it). The UAX frequently references table-based implementations, but it
    > never describes what they are exactly or how I might go about
    > implementing them. I tried finding the code in ICU for it, but I'm
    > somewhat new at C++ and could not locate where the tables were
    > generated.
    > If someone could help me in this, that would be great.
    > Daniel Ehrenberg


    This archive was generated by hypermail 2.1.5 : Wed Aug 22 2007 - 11:50:13 CDT