RE: Encodings for SQL Databases

From: Michael Kung (
Date: Mon Aug 07 2000 - 17:15:59 EDT

From sorting point of view, given no actual official surrogate character
assignment yet, the surrogate character sort key is more less undefined in
7.0. The data is not corrupted but is managed as part of undefined category
in terms of sorting. In other words, SQL 7.0 will treat those same as
user-defined characters.

Again, once ISO defines those actual characters, any type of surrogate
collation implementation has no substantial context. I certainly like to
know if there are other users have index dependency on the corrugate
collation as today. If you do, please let me know, 'r' please, ASAP.



-----Original Message-----
From: Michael (michka) Kaplan []
Sent: Monday, August 07, 2000 1:19 PM
To: Michael Kung; Unicode List
Subject: Re: Encodings for SQL Databases

I understand that part.... but you did not answer my question. :-)

Are you saying that a value made up of twelve 16-byte values that was
actually six surrogates would be treated as:

a) Six characters with unknown sort characteristics, or

b) Twelve characters, at least six of which would have unknown sort
characteristics (since the first two bytes of a surrogate would not have a
defined sort order and the second two byte which might randomly coincide
with an existing BMP value when treated as a separate Unicode code point.

I would call (a) "surrogate aware", and (b) "surrogate safe", where "safe"
would be defined as "at least the data did not get corrupted!". Obviously it
is not entirely safe when you are considering collation and intrinsic string
manipulation issues.

I was left with the impression by a nameless dev in SQLS that at least in
SQL 7.0, (b) was the case. I would LOVE to be incorrect on this point


----- Original Message -----
From: "Michael Kung" <>
To: "Unicode List" <>
Sent: Monday, August 07, 2000 12:52 PM
Subject: RE: Encodings for SQL Databases

> SQLServer 7.0 and SQLServer 2000 are surrogate safe on the
> NCHAR/NVARCHAR/NTEXT storage. Not until the ISO standard accepts the
> surrogate assignment, any surrogate support statement does not provide any
> substantial context.
> Michael
> -----Original Message-----
> From: Michael (michka) Kaplan []
> Sent: Monday, August 07, 2000 9:01 AM
> To: Unicode List
> Subject: Re: Encodings for SQL Databases
> From: <>
> > According to the online help of SQL Server 7.0, you have to
> > use the syntax N'abc' to write a Unicode literal in a SQL
> > statement.
> >
> > The N prefix echoes the N in NCHAR and NVARCHAR, and
> >parallels the L"abc" syntax of C (but I wonder, what's that "N"
> > for? One would expect W[ide], L[ong], or U[nicode]).
> This stands for "National" and comes from the ANSI-92 specification for
> (pardon the political incorrectness!).
> > I then tried saving the script with "Save As...". The choices
> > where "ANSI", "OEM (cp 437)", and "Unicode". Guess which
> > one I chose, and it saved the file in the UTF-16 (or is it UCS-2?)
> > format that is accepted by Notepad (find the file attached).
> Technically speaking, UCS-2 might be more accurate since SQL 7.0 does not
> have surrogate awareness. SQL Server 2000 has some surrogate awareness,
> the sorting of such characters is currently undefined, but I g uess you
> could claim it to be UTF-16 (although the docs do not do so).
> > About API's, I guess that:
> > 1) The N prefix for string literals should be used as well;
> Yes.
> > 2) The details of the UTF form used are handled by the API.
> Yes, plus. :-)
> michka
> Michael Kaplan
> Trigeminal Software, Inc.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT