Open Source C++ implementation of ICU's BreakIterator framework?

From: Samphan Raruenrom (
Date: Wed Jan 06 1999 - 03:17:36 EST


I've implemented a C++ BreakIterator for Thai and use it in Mozilla :-
Is it ok to adopt the BreakIterator API like this?

Your implementation of Thai contextual word breaking in
the article is very impressive. :-)
So now ICU is ready for Thai in addition to CJK, etc.
Is it possible for IBM to make their C++ version Open Source?


Erik van der Poel wrote:
> Samphan Raruenrom wrote:
> > Frank Tang wrote:
> > > Text Boundary Analysis in Java
> > >
> >
> > Actually, I think we should implement exactly the same
> > thing (C++ BreakIterator in ICU) as open source,
> > shoundn't we?
> I think we should use the exact same API as the C++ version of the new
> BreakIterator (except that ours should be XPCOM). We can start building
> the implementation under that API in an Open Source fashion.
> *If* IBM decides to make their C++ version Open Source, and if it is
> better than Mozilla's at that point in time, then it should be
> relatively easy to adopt it.

Erik van der Poel wrote:
> Mark Leisher wrote:
> > Has the problem of finding good line wrapping points for Chinese and Japanese
> > been handled yet?
> No, we haven't implemented this in the NGLayout-based code yet. It would
> be nice if we could come up with a single solution that covered the
> special rules of Chinese and Japanese, and also the Thai line breaking
> rules. Ideally, we would like to support UTF-8 documents that might
> contain a mixture of Thai and Japanese (say).
> Furthermore, it would be nice if the Japanese code/data was only loaded
> when necessary (i.e. when Japanese (or Unicode Han) data is
> encountered). And the Thai line breaking dictionary should only be
> loaded if the document contains Thai.
> And all of this through XPCOM interfaces, so that you can drop
> additional language support in at a later date.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT