UnicodeIUC22
Program Showcase Registration Accommodation Travel Sponsors
Unicode Standard Conference Board Conference CD Last Conference Past Conferences Next Conference
Abstract

Compact Encodings of Unicode

Markus Scherer - IBM Corporation

Intended Audience: Software Engineers, Systems Analysts, Content Developers
Session Level: Beginner, Intermediate

This talk discusses ways to reduce the size of Unicode text in files and protocols by choosing a compact encoding, optionally combined with general-purpose compression.

Unicode is often perceived to be "too big", and to cause an increase in text size compared to traditional codepages. Concerns are raised especially for systems with limited connection bandwidth, e.g., dial-up or long-range wireless networks, and for computers with small memory sizes, like PDAs and cell phones.

There are several Unicode encodings available with different encoding size characteristics. After an overview of UTF-8/16, SCSU, and BOCU-1, their use in different environments is discussed and compared with general-purpose compression and traditional codepages. Comparison numbers are presented, based on the ICU implementation. Software support for the encoding and compression schemes is discussed.


Unicode
When the world wants to talk, it speaks Unicode

UnicodeIUC22
Program Showcase Registration Accommodation Travel Sponsors
Unicode Standard Conference Board Conference CD Last Conference Past Conferences Next Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

23 May 2002, Webmaster