23rd Internationalization and Unicode Conference

Unicode Encoding Formats: A Devil In Disguise?

Debmalya Biswas - InfoSys Technologies Limited

Intended Audience:	Managers, Software Engineers, Systems Analysts, Content Developers, Technical Writers
Session Level:	Intermediate

Statement Of Purpose:

Unicode.org defines Unicode as "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language". While Unicode does what it says if you consider a Platform (Operating System) or a Language (Software l! an! guage) as an independent entity, but the moment you start talking about implementing international character support across languages and platforms via Unicode, you are entering dangerous territory. The reason being the various Unicode encoding formats (UTF-7, UTF-8, UTF-16, UTF-32, UCS-2, UCS-4 etc) used by different languages/platforms.

In Unicode terminology, the numbers assigned to each character is called code point. Since encoding formats determine how a code point is represented at the bit level, whenever Unicode data is passed from one language/platform to another language/platform using a different encoding format, the situation becomse similar to as was prevalent before Unicode came into being (the problem of different charsets being used for different language characters). The paper stresses the need for a better solution to the interoperability woes than the measly format conversion routines provided (or you can say not provided) by different languages/ platforms.

Paper Description:

This paper attempts to highlight the problems/issues faced by software developers while implementing an I18N solution, which spans across multiple platforms/languages by taking a few real life interoperability examples (such as between Java/C++, HTML/ASP/JSP, SQL Server/ODBC/JDBC). It also analyzes the solutions or answers provided by some of these languages/platforms to overcome these issues.

Conclusion:

Although the various encoding formats are necessary given the various limitations and requirements of different languages/platforms, but the fact remains that the presence of these is a serious hindrance when it comes to interoperability between languages/platforms.

Prerequisite:

A basic understanding of Software Internationlaization as well as aware of Unicode Terminology.

When the world wants to talk, it speaks Unicode

International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

12 December 2002, Webmaster