Re: Oracle and Surrogate Pairs

From: Jianping Yang (
Date: Mon Jul 24 2000 - 22:06:17 EDT


As Oracle UTF8 character set definition supports surrogates by a pairs
of two 3-bytes to be sync with UTF-16 in binary sorting and code point,
you will have the same issue to determine how many bytes for UTF8 as how
many ushorts for UTF-16 if you want to have exactly match in surrogate
support. But as memory for varchar type is dynamically allocated based
on actual data, you may need to declare the size a little bit larger to
take care of the potential support for surrogate.


Mikko Lahti wrote:

> What is the recommendation what comes dealing surrogate pairs and
> supporting CJK Unified Ideographs, Extension B (especially HKSCS)
> which will be in next version of the Unicode standard?
> Mikko
> -----Original Message-----
> From: Jianping Yang []
> Sent: Monday, July 24, 2000 5:08 PM
> To: Mikko Lahti
> Cc: Unicode List
> Subject: Re: Oracle and Surrogate Pairs
> Mikko,
> As there is no character defined in surrogate range in Unicode 3.0,
> the maximum width for Oracle UTF8 character set is 3 bytes. Here I
> recommend you to use 3 times for the number of characters you intend
> to store in a column.
> Regards,
> Jianping..
> Mikko Lahti wrote:
> What is the correct way of supporting surrogate pairs in Oracle 8?
> Anything wrong with approach of making fields 3 times longer from
> ASCII or should fields be 4 times ASCII as per UTF-8 spec?
> Later,
> Mikko
> Globalization Specialist
> Onyx Software
> 425.519.4172

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT