Re: Encodings for SQL Databases

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Aug 07 2000 - 16:18:32 EDT


I understand that part.... but you did not answer my question. :-)

Are you saying that a value made up of twelve 16-byte values that was
actually six surrogates would be treated as:

a) Six characters with unknown sort characteristics, or

b) Twelve characters, at least six of which would have unknown sort
characteristics (since the first two bytes of a surrogate would not have a
defined sort order and the second two byte which might randomly coincide
with an existing BMP value when treated as a separate Unicode code point.

I would call (a) "surrogate aware", and (b) "surrogate safe", where "safe"
would be defined as "at least the data did not get corrupted!". Obviously it
is not entirely safe when you are considering collation and intrinsic string
manipulation issues.

I was left with the impression by a nameless dev in SQLS that at least in
SQL 7.0, (b) was the case. I would LOVE to be incorrect on this point
though.

michka

----- Original Message -----
From: "Michael Kung" <mkung@microsoft.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Monday, August 07, 2000 12:52 PM
Subject: RE: Encodings for SQL Databases

> SQLServer 7.0 and SQLServer 2000 are surrogate safe on the
> NCHAR/NVARCHAR/NTEXT storage. Not until the ISO standard accepts the
> surrogate assignment, any surrogate support statement does not provide any
> substantial context.
>
> Michael
>
> -----Original Message-----
> From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
> Sent: Monday, August 07, 2000 9:01 AM
> To: Unicode List
> Subject: Re: Encodings for SQL Databases
>
> From: <Marco.Cimarosti@icl.com>
>
> > According to the online help of SQL Server 7.0, you have to
> > use the syntax N'abc' to write a Unicode literal in a SQL
> > statement.
> >
> > The N prefix echoes the N in NCHAR and NVARCHAR, and
> >parallels the L"abc" syntax of C (but I wonder, what's that "N"
> > for? One would expect W[ide], L[ong], or U[nicode]).
>
> This stands for "National" and comes from the ANSI-92 specification for
SQL
> (pardon the political incorrectness!).
>
> > I then tried saving the script with "Save As...". The choices
> > where "ANSI", "OEM (cp 437)", and "Unicode". Guess which
> > one I chose, and it saved the file in the UTF-16 (or is it UCS-2?)
> > format that is accepted by Notepad (find the file attached).
>
> Technically speaking, UCS-2 might be more accurate since SQL 7.0 does not
> have surrogate awareness. SQL Server 2000 has some surrogate awareness,
and
> the sorting of such characters is currently undefined, but I g uess you
> could claim it to be UTF-16 (although the docs do not do so).
>
> > About API's, I guess that:
> > 1) The N prefix for string literals should be used as well;
>
> Yes.
>
> > 2) The details of the UTF form used are handled by the API.
>
> Yes, plus. :-)
>
> michka
>
> Michael Kaplan
> Trigeminal Software, Inc.
> http://www.trigeminal.com/
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT