Re: Encodings for SQL Databases

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Aug 07 2000 - 16:59:27 EDT


From: <addison@inter-locale.com>
To: "Michael (michka) Kaplan" <michka@trigeminal.com>

> > > b) Twelve characters, at least six of which would have unknown sort
> > > characteristics (since the first two bytes of a surrogate would not
have a
> > > defined sort order and the second two byte which might randomly
coincide
> > > with an existing BMP value when treated as a separate Unicode code
point.
> > >
> Actually, the way surrogates work is: one high surrogate followed by one
> low surrogate. The second value would never, ever, coincide with a valid
> character (in the same way that bytes in UTF-8 multibyte characters never
> collide with valid ASCII values).
>
> So (b) should read:
>
> Twelve characters, all of which have unknown sort characteristics and each
> of which is treated as a separate Unicode code point.
>

Ah, thank you for the correction.... I have been writing all day and my
brain is a bit fuzzy (as is witnessed by my brain freeze above with 16-byte
characters!).

> This is, I believe, what SQL Server 7.0 actually does: it is surrogate
> unaware.

That was my understanding, as well. This would make it UCS-2, will not
corrupt such data, but it also will not handle it with any awareness of
surrogates.

michka

Michael Kaplan
Trigeminal Software, Inc.
http://www.trigeminal.com/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT