Re: RE: Encodings for SQL Databases

From: addison@inter-locale.com
Date: Mon Aug 07 2000 - 18:28:21 EDT


Note that this:

a) only works for SQL Server (and databases set up to use UTF-8 as the
alternate character set). Oracle, for example, can be set up to have the
database character set be UTF-8, in which case you do *NOT* want to use
the "N" prefix notation.

b) It is the database connection (ODBC driver, JDBC driver, proprietary
driver like OCI, etc.), not the database that is at issue here. Assuming a
Unicode database, the database driver will usually convert the query to
and the result set from Unicode to the local character set. If this is
some flavor of Unicode, so much the better.

Not all drivers support a flavor of Unicode. Your mileage may vary. Check
your vendor carefully.

Some drivers must be explicitly set to use Unicode (cf. Oracle OCI driver
relies on the NLS_LANG variable to determine what its character set it).

Hope this helps.

Addison

===========================================================
Addison P. Phillips Principal Consultant
Inter-Locale LLC http://www.inter-locale.com
Los Gatos, CA, USA mailto:addison@inter-locale.com

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
===========================================================
Globalization Engineering & Consulting Services

On Mon, 7 Aug 2000 Marco.Cimarosti@icl.com wrote:

> ((( Sorry to those who see a mangled subject. It should read "RE: Encodings
> for SQL Databases" )))
>
> Jon Peck wrote:
> > Most of the major databases now support Unicode at some
> > level, but what is
> > the best way to encode SQL statements for various database
> > access apis? [...]
>
> According to the online help of SQL Server 7.0, you have to use the syntax
> N'abc' to write a Unicode literal in a SQL statement.
>
> The N prefix echoes the N in NCHAR and NVARCHAR, and parallels the L"abc"
> syntax of C (but I wonder, what's that "N" for? One would expect W[ide],
> L[ong], or U[nicode]).
>
> I tried the following code in Query Analyzer. The example comes from the
> help; I substituted the Danish string with a Chinese one to be sure that
> characters >= U+0100 behaved OK.
>
> DECLARE @nstring nchar(8)
> SET @nstring = N'你好'
> SELECT UNICODE(SUBSTRING(@nstring, 2, 1)),
> NCHAR(UNICODE(SUBSTRING(@nstring, 2, 1)))
>
> The result is:
>
> ----------- ----
> 22909 好
>
> (1 row(s) affected)
>
> Where 22909 = 0x597D, which is in fact the code of the 2nd character in the
> string: "好" (hao3).
>
> The Chinese characters were visible in the Query Analyzer's window, as soon
> as I selected a proper font.
>
> I then tried saving the script with "Save As...". The choices where "ANSI",
> "OEM (cp 437)", and "Unicode". Guess which one I chose, and it saved the
> file in the UTF-16 (or is it UCS-2?) format that is accepted by Notepad
> (find the file attached).
>
> About API's, I guess that:
>
> 1) The N prefix for string literals should be used as well;
>
> 2) The details of the UTF form used are handled by the API.
>
> _ Marco
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT