Re: UTF-16 or UTF-32 on Oracle 8

From: Addison Phillips [GSC] (addison@globalsight.com)
Date: Fri Apr 14 2000 - 18:56:22 EDT


Note that if you store your text as UTF-32 (as a binary object or by
coercing the database somehow), you will lose Oracle's ability to natively
process your text. That is, you will have to collate the text yourself,
parse the text yourself, index the text yourself, etc. This is a significant
loss in terms of database functionality. There are also limitations on how
you can use raw fields in Oracle (and other databases).

You will probably want to reconsider storing the data in the database as
UTF-32 directly. You know your application best, so using UTF-32 internally
is probably fine, but it is a straightforward thing to convert the data from
UTF-8 or UTF-16 to UTF-32 when storing and/or retrieving it from the
database. Plus you'll save all of that storage space ;-).

Privately, I doubt that you're gaining anything in real simplicity by using
UTF-32 for *storage*, since there are no libraries, databases, data types,
and other support mechanisms in widespread use or built into operating
systems and databases to help you along. I'd rather not debate the relative
merits of using UTF-32 internally to your application (for all I know you
are writing an Egyptian Hieroglyphics word processor), but it is generally a
Bad Idea to go against the design of your underlying system and try to write
code to do all of your sorting, display rendering, normalization, etc.,
yourself.

thanks,

Addison

Addison P. Phillips
Senior Globalization Consultant
Global Sight Corporation
mailto:addison@globalsight.com
================================
(+1) 408.350.3600 - Telephone
http://www.globalsight.com
================================
Going global with your web site? Global Sight provides Web-based software
solutions that simplify the process, cut costs, and save time.
----- Original Message -----
From: Jianping Yang <jiyang@us.oracle.com>
To: Unicode List <unicode@unicode.org>
Cc: Unicode List <unicode@unicode.org>
Sent: Friday, April 14, 2000 2:42 PM
Subject: Re: UTF-16 or UTF-32 on Oracle 8

> Oracle8 does not support UTF-32 encoding form as database character set
> or
> in its access API. If you want to make simplicity for your case, you may
>
> use BLOB to store UTF-32 in database without any character set
> conversion.
>
> If you have any more question, please send to me directly.
>
> Regards,
> Jianping.
>
> attilla ong wrote:
>
> > Hi Folks,
> >
> > If this topic has already been beaten to death,
> > please direct me to the Oracle I18N expert that I know
> > is on the mailing list :-)
> >
> > We have initially plan to save our data as UTF-32.
> > We have opted for the simplicity over size of
> > database. We just installed Oracle but realize that it
> > supports only UTF-8 naturally (at least according to
> > its documentation).
> >
> > The question is whether Oracle8 supports UTF-32
> > natively without us having to coerce a character into
> > a 4-byte datatype. If not, what are the consequences
> > of insisting of storing an array of UTF-32 characters
> > as an array of 4-byte-datatype (or whatever it's
> > called).
> >
> > excerpt from Oracle 8's documentation:
> >
> > "The Unicode character repertoire can be represented
> > in a number of different encoding formats. UCS-2 is a
> > two-byte fixed-width format, UTF-8 is a multi-byte
> > format with variable width. Oracle8 supports the
> > UTF-8 format only. UTF-8 is an ASCII-compatible
> > encoding scheme. The Oracle character set name for
> > Unicode 2.0 is "UTF8". Unicode 1.1 has been supported
> > with the Oracle character set name of "AL24UTFFSS"
> > since Oracle7"
> >
> > Thank you.
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Send online invitations with Yahoo! Invites.
> > http://invites.yahoo.com
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT