Open Source C++ implementation of ICU's BreakIterator framework?

From: Samphan Raruenrom (samphan@thai.com)
Date: Wed Jan 06 1999 - 03:17:36 EST


Hi,

I've implemented a C++ BreakIterator for Thai and use it in Mozilla :-
http://www.thai.net/libinthai/
Is it ok to adopt the BreakIterator API like this?

Your implementation of Thai contextual word breaking in
the article is very impressive. :-)
So now ICU is ready for Thai in addition to CJK, etc.
Is it possible for IBM to make their C++ version Open Source?

Samphan.

Erik van der Poel wrote:
> Samphan Raruenrom wrote:
> > Frank Tang wrote:
> > > Text Boundary Analysis in Java
> > > http://www.ibm.com/java/education/boundaries/boundaries.html
> >
> > Actually, I think we should implement exactly the same
> > thing (C++ BreakIterator in ICU) as open source,
> > shoundn't we?
>
> I think we should use the exact same API as the C++ version of the new
> BreakIterator (except that ours should be XPCOM). We can start building
> the implementation under that API in an Open Source fashion.
>
> *If* IBM decides to make their C++ version Open Source, and if it is
> better than Mozilla's at that point in time, then it should be
> relatively easy to adopt it.

Erik van der Poel wrote:
> Mark Leisher wrote:
> > Has the problem of finding good line wrapping points for Chinese and Japanese
> > been handled yet?
> No, we haven't implemented this in the NGLayout-based code yet. It would
> be nice if we could come up with a single solution that covered the
> special rules of Chinese and Japanese, and also the Thai line breaking
> rules. Ideally, we would like to support UTF-8 documents that might
> contain a mixture of Thai and Japanese (say).
>
> Furthermore, it would be nice if the Japanese code/data was only loaded
> when necessary (i.e. when Japanese (or Unicode Han) data is
> encountered). And the Thai line breaking dictionary should only be
> loaded if the document contains Thai.
>
> And all of this through XPCOM interfaces, so that you can drop
> additional language support in at a later date.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT