As Oracle UTF8 character set definition supports surrogates by a pairs
of two 3-bytes to be sync with UTF-16 in binary sorting and code point,
you will have the same issue to determine how many bytes for UTF8 as how
many ushorts for UTF-16 if you want to have exactly match in surrogate
support. But as memory for varchar type is dynamically allocated based
on actual data, you may need to declare the size a little bit larger to
take care of the potential support for surrogate.
Mikko Lahti wrote:
> What is the recommendation what comes dealing surrogate pairs and
> supporting CJK Unified Ideographs, Extension B (especially HKSCS)
> which will be in next version of the Unicode standard?
> -----Original Message-----
> From: Jianping Yang [mailto:Jianping.Yang@oracle.com]
> Sent: Monday, July 24, 2000 5:08 PM
> To: Mikko Lahti
> Cc: Unicode List
> Subject: Re: Oracle and Surrogate Pairs
> As there is no character defined in surrogate range in Unicode 3.0,
> the maximum width for Oracle UTF8 character set is 3 bytes. Here I
> recommend you to use 3 times for the number of characters you intend
> to store in a column.
> Mikko Lahti wrote:
> What is the correct way of supporting surrogate pairs in Oracle 8?
> Anything wrong with approach of making fields 3 times longer from
> ASCII or should fields be 4 times ASCII as per UTF-8 spec?
> Globalization Specialist
> Onyx Software
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT