Re: Oracle and Surrogate Pairs

From: Jianping Yang (Jianping.Yang@oracle.com)
Date: Mon Jul 24 2000 - 22:06:17 EDT


Mikko,

As Oracle UTF8 character set definition supports surrogates by a pairs
of two 3-bytes to be sync with UTF-16 in binary sorting and code point,
you will have the same issue to determine how many bytes for UTF8 as how
many ushorts for UTF-16 if you want to have exactly match in surrogate
support. But as memory for varchar type is dynamically allocated based
on actual data, you may need to declare the size a little bit larger to
take care of the potential support for surrogate.

Regards,
Jianping.

Mikko Lahti wrote:

> What is the recommendation what comes dealing surrogate pairs and
> supporting CJK Unified Ideographs, Extension B (especially HKSCS)
> which will be in next version of the Unicode standard?
>
> Mikko
>
> -----Original Message-----
> From: Jianping Yang [mailto:Jianping.Yang@oracle.com]
> Sent: Monday, July 24, 2000 5:08 PM
> To: Mikko Lahti
> Cc: Unicode List
> Subject: Re: Oracle and Surrogate Pairs
>
> Mikko,
>
> As there is no character defined in surrogate range in Unicode 3.0,
> the maximum width for Oracle UTF8 character set is 3 bytes. Here I
> recommend you to use 3 times for the number of characters you intend
> to store in a column.
>
> Regards,
> Jianping..
>
> Mikko Lahti wrote:
>
> What is the correct way of supporting surrogate pairs in Oracle 8?
> Anything wrong with approach of making fields 3 times longer from
> ASCII or should fields be 4 times ASCII as per UTF-8 spec?
>
> Later,
>
> Mikko
> Globalization Specialist
> Onyx Software
> MikkoL@onyx.com
> www.onyx.com
> 425.519.4172
>





This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT