Re: Encodings for SQL Databases

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Aug 07 2000 - 19:20:45 EDT

Next message: John Cowan: "Re: Summary: xml:lang validity and RFC 1766 refs to outdated codes [l"
Previous message: John Cowan: "Re: Encodings for SQL Databases"
In reply to: John Cowan: "Re: Encodings for SQL Databases"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

From: "John Cowan" <cowan@locke.ccil.org>
> > Are you saying that a value made up of twelve 16-byte values that was
> > actually six surrogates would be treated as:
> >
> > a) Six characters with unknown sort characteristics, or
> >
> > b) Twelve characters, at least six of which would have unknown sort
> > characteristics (since the first two bytes of a surrogate would not have
a
> > defined sort order and the second two byte which might randomly coincide
> > with an existing BMP value when treated as a separate Unicode code
point.
>
> I can't answer the question, but there is an erroneous preconception here.
> Neither of the 16-bit units of a surrogate pair can coincide with any
> existing BMP value.

Yes, this is true, and was a mistake on my part.

> > I would call (a) "surrogate aware", and (b) "surrogate safe", where
"safe"
> > would be defined as "at least the data did not get corrupted!".
Obviously it
> > is not entirely safe when you are considering collation and intrinsic
string
> > manipulation issues.
>
> Every surrogate-unaware application is surrogate-safe in your limited
> sense, unless it goes to the trouble of weeding out surrogates (which is
> pointless). True surrogate-unsafeness appears when you allow things like
> inserting characters into a string, in which case it is unsafe to
> allow inserting after a high-part surrogate.

Ah, well by that definition, SQL Server 7.0 is not surrogate-safe, either,
to the extent that you could use Transact-SQL scalar functions such as STUFF
to do just that. Luckily, such operations would be relatively uncommon.

michka

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/

Next message: John Cowan: "Re: Summary: xml:lang validity and RFC 1766 refs to outdated codes [l"
Previous message: John Cowan: "Re: Encodings for SQL Databases"
In reply to: John Cowan: "Re: Encodings for SQL Databases"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT