UnicodeIUC22
Program Showcase Registration Accommodation Travel Sponsors
Unicode Standard Conference Board Conference CD Last Conference Past Conferences Next Conference
Abstract

An Integrated Greater China Data Base

C.C. Hsu - IBM Taiwan

Intended Audience: Software Engineers, Systems Analysts, Marketers, Content Developers, Font Designers, Technical Writers
Session Level: Beginner, Intermediate

It's really necessary to integrate all related information for Greater China four geographies: mainland China, Hong Kong, Macao, and Taiwan. Especially, the requirements are urgently for two way conversion between simplified Chinese and traditional Chinese.

An integrated greater China data base includes not only code base conversion, but also character, phonetic, word, font and meaning conversion. This data base is created in Unicode environment. It provides more accurate and more quick functions for natural language processing related products to convert between simplified Chinese and traditional Chinese data.

In this presentation, we will cover the following topics.

  • Code conversion: one-to-one, one-to-many and many-to-one within GB2312/GBK/GB13080/Big-5/Big-5+/CNS11643/UCS
  • Character conversion within Unicode: e.g.,(that) is same, (allow) is different, (queen & behind) is mixed mapping - need to be recognized by context
  • Phonetic conversion: also have mixed mapping within BoPoMoFo/PinYin/index number, e.g., /hao3(good) is same, /wei(danger) is different, /qian2(dry & proper noun) is mixed
  • Word conversion: should aware unique words, e.g., Hello/ is same, Sydney/ is different, /old women is Taiwan unique word, /chat is China unique word
  • Font conversion: may solved by change font style, e.g., (alike) is same, (door) is different
  • Meaning conversion: is under advance study now, e.g.,/cuttlefish & spray/spray is mixed situation

In the meantime, cross relationship within above categories is shown.


Unicode
When the world wants to talk, it speaks Unicode

UnicodeIUC22
Program Showcase Registration Accommodation Travel Sponsors
Unicode Standard Conference Board Conference CD Last Conference Past Conferences Next Conference
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS). GMS is pleased to be able to offer the International Unicode Conferences under an exclusive license granted by the Unicode Consortium. All responsibility for conference finances and operations is borne by GMS. The independent conference board serves solely at the pleasure of GMS and is composed of volunteers active in Unicode and in international software development. All inquiries regarding International Unicode Conferences should be addressed to info@global-conference.com.

Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.

23 May 2002, Webmaster