Re: Encodings for SQL Databases

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Aug 07 2000 - 17:53:29 EDT


This is good. :-)

In your office, Peter did make the comment that SQL Server 2000 was
surrogate aware in terms of understanding that those four bytes represented
one character, but that obviously since that one character would be unknown,
it would be treated as any unknown character and its sort would be undefined
(as with all unknow characters).

Thank your your the clarification (its always nice to know one was right?
<g>).

michka

----- Original Message -----
From: "Michael Kung" <mkung@microsoft.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Monday, August 07, 2000 2:06 PM
Subject: RE: Encodings for SQL Databases

> From sorting point of view, given no actual official surrogate character
> assignment yet, the surrogate character sort key is more less undefined in
> 7.0. The data is not corrupted but is managed as part of undefined
category
> in terms of sorting. In other words, SQL 7.0 will treat those same as
> user-defined characters.
>
> Again, once ISO defines those actual characters, any type of surrogate
> collation implementation has no substantial context. I certainly like to
> know if there are other users have index dependency on the corrugate
> collation as today. If you do, please let me know, 'r' please, ASAP.
>
> Thanks,
>
> Michael
>
> -----Original Message-----
> From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
> Sent: Monday, August 07, 2000 1:19 PM
> To: Michael Kung; Unicode List
> Subject: Re: Encodings for SQL Databases
>
> I understand that part.... but you did not answer my question. :-)
>
> Are you saying that a value made up of twelve 16-byte values that was
> actually six surrogates would be treated as:
>
> a) Six characters with unknown sort characteristics, or
>
> b) Twelve characters, at least six of which would have unknown sort
> characteristics (since the first two bytes of a surrogate would not have a
> defined sort order and the second two byte which might randomly coincide
> with an existing BMP value when treated as a separate Unicode code point.
>
> I would call (a) "surrogate aware", and (b) "surrogate safe", where "safe"
> would be defined as "at least the data did not get corrupted!". Obviously
it
> is not entirely safe when you are considering collation and intrinsic
string
> manipulation issues.
>
> I was left with the impression by a nameless dev in SQLS that at least in
> SQL 7.0, (b) was the case. I would LOVE to be incorrect on this point
> though.
>
> michka
>
> ----- Original Message -----
> From: "Michael Kung" <mkung@microsoft.com>
> To: "Unicode List" <unicode@unicode.org>
> Sent: Monday, August 07, 2000 12:52 PM
> Subject: RE: Encodings for SQL Databases
>
>
> > SQLServer 7.0 and SQLServer 2000 are surrogate safe on the
> > NCHAR/NVARCHAR/NTEXT storage. Not until the ISO standard accepts the
> > surrogate assignment, any surrogate support statement does not provide
any
> > substantial context.
> >
> > Michael
> >
> > -----Original Message-----
> > From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
> > Sent: Monday, August 07, 2000 9:01 AM
> > To: Unicode List
> > Subject: Re: Encodings for SQL Databases
> >
> > From: <Marco.Cimarosti@icl.com>
> >
> > > According to the online help of SQL Server 7.0, you have to
> > > use the syntax N'abc' to write a Unicode literal in a SQL
> > > statement.
> > >
> > > The N prefix echoes the N in NCHAR and NVARCHAR, and
> > >parallels the L"abc" syntax of C (but I wonder, what's that "N"
> > > for? One would expect W[ide], L[ong], or U[nicode]).
> >
> > This stands for "National" and comes from the ANSI-92 specification for
> SQL
> > (pardon the political incorrectness!).
> >
> > > I then tried saving the script with "Save As...". The choices
> > > where "ANSI", "OEM (cp 437)", and "Unicode". Guess which
> > > one I chose, and it saved the file in the UTF-16 (or is it UCS-2?)
> > > format that is accepted by Notepad (find the file attached).
> >
> > Technically speaking, UCS-2 might be more accurate since SQL 7.0 does
not
> > have surrogate awareness. SQL Server 2000 has some surrogate awareness,
> and
> > the sorting of such characters is currently undefined, but I g uess you
> > could claim it to be UTF-16 (although the docs do not do so).
> >
> > > About API's, I guess that:
> > > 1) The N prefix for string literals should be used as well;
> >
> > Yes.
> >
> > > 2) The details of the UTF form used are handled by the API.
> >
> > Yes, plus. :-)
> >
> > michka
> >
> > Michael Kaplan
> > Trigeminal Software, Inc.
> > http://www.trigeminal.com/
> >
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT