Re: Encodings for SQL Databases

From: Michael \(michka\) Kaplan (michka@trigeminal.com)
Date: Mon Aug 07 2000 - 16:40:09 EDT


(please replace 16-byte with 16-bit, obviously).

michka
(who hopes that it is never important to need 16 bytes to represent a single
character, although realizes it may be needed if SETI makes contact with
someone out there!)

----- Original Message -----
From: "Michael (michka) Kaplan" <michka@trigeminal.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Monday, August 07, 2000 1:06 PM
Subject: Re: Encodings for SQL Databases

> I understand that part.... but you did not answer my question. :-)
>
> Are you saying that a value made up of twelve 16-byte values that was
> actually six surrogates would be treated as:
>
> a) Six characters with unknown sort characteristics, or
>
> b) Twelve characters, at least six of which would have unknown sort
> characteristics (since the first two bytes of a surrogate would not have a
> defined sort order and the second two byte which might randomly coincide
> with an existing BMP value when treated as a separate Unicode code point.
>
> I would call (a) "surrogate aware", and (b) "surrogate safe", where "safe"
> would be defined as "at least the data did not get corrupted!". Obviously
it
> is not entirely safe when you are considering collation and intrinsic
string
> manipulation issues.
>
> I was left with the impression by a nameless dev in SQLS that at least in
> SQL 7.0, (b) was the case. I would LOVE to be incorrect on this point
> though.
>
> michka
>
> ----- Original Message -----
> From: "Michael Kung" <mkung@microsoft.com>
> To: "Unicode List" <unicode@unicode.org>
> Sent: Monday, August 07, 2000 12:52 PM
> Subject: RE: Encodings for SQL Databases
>
>
> > SQLServer 7.0 and SQLServer 2000 are surrogate safe on the
> > NCHAR/NVARCHAR/NTEXT storage. Not until the ISO standard accepts the
> > surrogate assignment, any surrogate support statement does not provide
any
> > substantial context.
> >
> > Michael
> >
> > -----Original Message-----
> > From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
> > Sent: Monday, August 07, 2000 9:01 AM
> > To: Unicode List
> > Subject: Re: Encodings for SQL Databases
> >
> > From: <Marco.Cimarosti@icl.com>
> >
> > > According to the online help of SQL Server 7.0, you have to
> > > use the syntax N'abc' to write a Unicode literal in a SQL
> > > statement.
> > >
> > > The N prefix echoes the N in NCHAR and NVARCHAR, and
> > >parallels the L"abc" syntax of C (but I wonder, what's that "N"
> > > for? One would expect W[ide], L[ong], or U[nicode]).
> >
> > This stands for "National" and comes from the ANSI-92 specification for
> SQL
> > (pardon the political incorrectness!).
> >
> > > I then tried saving the script with "Save As...". The choices
> > > where "ANSI", "OEM (cp 437)", and "Unicode". Guess which
> > > one I chose, and it saved the file in the UTF-16 (or is it UCS-2?)
> > > format that is accepted by Notepad (find the file attached).
> >
> > Technically speaking, UCS-2 might be more accurate since SQL 7.0 does
not
> > have surrogate awareness. SQL Server 2000 has some surrogate awareness,
> and
> > the sorting of such characters is currently undefined, but I g uess you
> > could claim it to be UTF-16 (although the docs do not do so).
> >
> > > About API's, I guess that:
> > > 1) The N prefix for string literals should be used as well;
> >
> > Yes.
> >
> > > 2) The details of the UTF form used are handled by the API.
> >
> > Yes, plus. :-)
> >
> > michka
> >
> > Michael Kaplan
> > Trigeminal Software, Inc.
> > http://www.trigeminal.com/
> >
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT