UnicodeIUC20
Program Showcase Registration Accommodation Travel Sponsors
Unicode Standard Conference Board Conference CD Last Conference Past Conferences Next Conference
Abstract

Creating custom break iterators for ICU (International Components for Unicode)

Edward Batutis - Batutis Internationalization Consulting

Intended Audience: Software Engineers
Session Level: Intermediate, Advanced

This paper will discuss creating custom break iterators for International Components for Unicode (ICU) a popular internationalization toolkit. ICU for Java and ICU for C/C++ provide break iterators to be used for character, word, and line-breaking. These iterators are useful for parsing text - for example, extracting words for a search engine or implementing a word-wrap feature in a text editor.The break iterators supplied are sufficient for many purposes, but some implementors may wish to use their own customized iterators. This paper will first discuss the default break iterators supplied by ICU for Java and C/C++ and how they are implemented. Next, the paper will cover how the existing iterators can be extended or replaced to meet an application-specific requirement.


Unicode
When the world wants to talk, it speaks Unicode

UnicodeIUC20
Program Showcase Registration Accommodation Travel Sponsors
Unicode Standard Conference Board Conference CD Last Conference Past Conferences Next Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

4 November 2001, Webmaster