Multilingual Collation and Asian Sort Supports in Oracle 9i
Claire Ho, Winson Chu & Simon Law - Oracle Corporation
With the rapid development and broad deployment of internet technologies, multilingual support in software products is bringing increasingly high attention to companies who intend to expand and grow their global businesses. Oracle is also seeing an increase in the amount of requests for multilingual supports in the recent years. Multilingual collation is a crucial part of multilingual support. It allows customers and employees to more rapidly and accurately search for information and products in any language. This paper covers multilingual collation features and new Asian sort support in Oracle 9i.
This paper discusses the basic Oracle multilingual collation which is based on ISO/IEC 14651 with the addition of special handling of contracting characters, expanding characters and run-time checking for composed and decomposed characters based on Unicode 3.0 canonical equivalence rules. SQL string normalization APIs is also covered to show how these functionalites can be used at the SQL level.
Moreover, among a set of new Asian sorts that Oracle 9i supports, context sensitive sorting for Japanese and swap-with-next-character sorting for Thai and Lao are also introduced in this paper.
Finally, the paper discusses the flexibility and extensibility of this multilingual collation support model which can be easily adopted to support future CJK Extension B plan with surrogate pairs. Performance and memory consumption are taken into considerations in this model with its ability to support up to more than one million characters in a single sort.
The presentation is accompanied by a multilingual internet application demo to showcase Oracle 9i multilingual collation features.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
17 April 2001, Webmaster