Re: Encodings for SQL Databases

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Aug 07 2000 - 19:20:45 EDT


From: "John Cowan" <cowan@locke.ccil.org>
> > Are you saying that a value made up of twelve 16-byte values that was
> > actually six surrogates would be treated as:
> >
> > a) Six characters with unknown sort characteristics, or
> >
> > b) Twelve characters, at least six of which would have unknown sort
> > characteristics (since the first two bytes of a surrogate would not have
a
> > defined sort order and the second two byte which might randomly coincide
> > with an existing BMP value when treated as a separate Unicode code
point.
>
> I can't answer the question, but there is an erroneous preconception here.
> Neither of the 16-bit units of a surrogate pair can coincide with any
> existing BMP value.

Yes, this is true, and was a mistake on my part.

> > I would call (a) "surrogate aware", and (b) "surrogate safe", where
"safe"
> > would be defined as "at least the data did not get corrupted!".
Obviously it
> > is not entirely safe when you are considering collation and intrinsic
string
> > manipulation issues.
>
> Every surrogate-unaware application is surrogate-safe in your limited
> sense, unless it goes to the trouble of weeding out surrogates (which is
> pointless). True surrogate-unsafeness appears when you allow things like
> inserting characters into a string, in which case it is unsafe to
> allow inserting after a high-part surrogate.

Ah, well by that definition, SQL Server 7.0 is not surrogate-safe, either,
to the extent that you could use Transact-SQL scalar functions such as STUFF
to do just that. Luckily, such operations would be relatively uncommon.

michka

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT