Oracle 9i Multilingual Collation and Asian Sort Supports
Claire Ho & Winson Chu - Oracle Corporation
With the rapid development and broad deployment of internet technologies, multilingual support in software products is bringing increasingly high attention to companies who intend to expand and grow their global businesses. Oracle is also seeing an increase in the amount of requests for multilingual support in recent years. Multilingual collation is a crucial part of overall multilingual support. It allows customers and employees to more rapidly and accurately search for information and products in any language. This paper covers the challenges of linguistic, Asian and multilingual collation and how they are supported in Oracle 9i.
This paper discusses the basic Oracle multilingual collation which is based on ISO/IEC 14651 with the addition of special handling of contracting characters, expanding characters and run-time checking for composed and decomposed characters based on Unicode 3.0 canonical equivalence rules. SQL string normalization APIs are also covered to show how these functionalites can be used at the SQL level.
Moreover, among a set of new Asian sorts that Oracle 9i supports, context sensitive sorting for Japanese and swap-with-next-character sorting for Thai and Lao are also introduced in this paper.
Finally, the paper discusses the flexibility and extensibility of this multilingual collation support model which can be easily adopted to support future Extension B plan and surrogate pairs. Performance and memory consumption are taken into consideration in this model with its ability to support more than one million characters in a single sort.
The presentation is accompanied by a multilingual internet application demo to showcase the Oracle 9i multilingual collation features.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
22 Jun 2001, Webmaster