Use Of ICU BreakIterator In Lexical Analysis Of Multiple Languages
Sean Callanan - IBM Ireland
The IBM Dictionary and Linguistic Tools Group produces linguistic analysis tools support over 30 different languages. This presentation will describe the use of ICU BreakIterator in Lexical Analysis. This will include a description of how ICU is used and how we build on this technology to solve some lexical analysis issues such as identifying:
The International Components for Unicode(ICU) is a C and C++ library that provides robust and full-featured Unicode support on a wide variety of platforms.
The ICU BreakIterator maintain a current position and scan over text returning the index of characters where boundaries occur. Word boundary analysis is used by search and replace functions, as well as within text editing applications that allow the user to select words with a double click.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
21 February 2002, Webmaster