RE: Java, SQL, Unicode and Databases

From: Joe_Ross@tivoli.com
Date: Fri Jun 23 2000 - 17:27:08 EDT


Michael, are you saying that the data type (char or nchar) doesn't matter? Are
you saying that if we just use UTF-16 or wchar_t interfaces to access the data
all will be fine and we will be able to store multilingual data even in fields
defined as char? Maybe things aren't as bad as I feared.

With respect to the web applications you describe, do they store the UTF-8 as
binary data? This wouldn't work for us, since we want other data mining
applications to be able to access the same data.

Thanks,
Joe

"Michael Kaplan (Trigeminal Inc.)" <v-michka@microsoft.com> on 06/23/2000
10:41:39 AM

To: Unicode List <unicode@unicode.org>, Joe Ross/Tivoli Systems@Tivoli Systems
cc: Hossein Kushki@IBMCA
Subject: RE: Java, SQL, Unicode and Databases

Microsoft is very COM-based for its actual data access methods.... and COM
uses BSTRs that are BOM-less UTF-16. Because of that, the actual storage
format of any database ends up irrelevant since it will be converted to
UTF-16 anyway.

Given that this is what the data layers do, performance is certainly better
if there does not have to be an extra call to the Windows
MutliByteToWideChar to convert UTF-8 to UTF-16. So from a Windows
perspective, not only is it no trouble, but it also the best possible
solution!

In any case, I know plenty of web people who *do* encode their strings in
SQL Server databases as UTF-8 for web applications, since UTF-8 is their
preference. They are willing to take the hit of "converting themselves"
because when data is being read it is faster to go through no conversions at
all.

Michael

> ----------
> From: Joe_Ross@tivoli.com[SMTP:Joe_Ross@tivoli.com]
> Sent: Friday, June 23, 2000 7:55 AM
> To: Unicode List
> Cc: Unicode List; Hossein_Kushki%IBMCA@tivoli.com
> Subject: Re: Java, SQL, Unicode and Databases
>
>
>
> I think that this is also true for DB2 using UTF-8 as the database
> encoding.
> From an application perspective, MS SQL Server is the one that gives us
> the most
> trouble, because it doesn't support UTF-8 as a database encoding for char,
> etc.
> Joe
>
> Kenneth Whistler <kenw@sybase.com> on 06/22/2000 06:42:20 PM
>
> To: "Unicode List" <unicode@unicode.org>
> cc: unicode@unicode.org, kenw@sybase.com, mgm@sybase.com (bcc: Joe
> Ross/Tivoli
> Systems)
> Subject: Re: Java, SQL, Unicode and Databases
>
>
>
>
> Jianping responded:
>
> >
> > Tex,
> >
> > Oracle doesn't have special requirement for datatype in JDBC driver if
> you use
> UTF8 as database
> > character set. In this case, all the text datatype in JDBC will support
> Unicode data.
> >
>
> The same thing is, of course, true for Sybase databases using UTF-8
> at the database character set, accessing them through a JDBC driver.
>
> But I think Tex's question is aimed at the much murkier area
> of what the various database vendors' strategies are for dealing
> with UTF-16 Unicode as a datatype. In that area, the answers for
> what a cross-platform application vendor needs to do and for how
> JDBC drivers might abstract differences in database implementations
> are still unclear.
>
> --Ken
>
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:04 EDT