Compact Encodings of Unicode
Markus Scherer - IBM Corporation
This talk discusses ways to reduce the size of Unicode text in files and protocols by choosing a compact encoding, optionally combined with general-purpose compression.
Unicode is often perceived to be "too big", and to cause an increase in text size compared to traditional codepages. Concerns are raised especially for systems with limited connection bandwidth, e.g., dial-up or long-range wireless networks, and for computers with small memory sizes, like PDAs and cell phones.
There are several Unicode encodings available with different encoding size characteristics. After an overview of UTF-8/16, SCSU, and BOCU-1, their use in different environments is discussed and compared with general-purpose compression and traditional codepages. Comparison numbers are presented, based on the ICU implementation. Software support for the encoding and compression schemes is discussed.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
23 May 2002, Webmaster