From: Marcin 'Qrczak' Kowalczyk (firstname.lastname@example.org)
Date: Fri Aug 06 2004 - 12:04:48 CDT
W li¶cie z czw, 05-08-2004, godz. 15:52 -0500, John Tisdale napisał:
> Yet, if you are working with an application that must parse and
> manipulate text at the byte-level, the costliness of variable length
> encoding will probably outweigh the benefits of ASCII compatibility.
> In such a case the fixed length of UCS-2 will usually prove the better
> choice. This is why Windows NT and subsequent Microsoft operating
> systems, SQL Server 7 (and subsequent ones), XML, Java, COM, ODBC,
> OLEDB and the .NET framework are all built on UCS-2 Unicode encoding.
At least some of them use UTF-16, not UCS-2, e.g. Java 1.5. I wonder
if not most of them actually. At least in theory.
> The uniform length of UCS provides a good foundation when it comes to
> complex data manipulation.
And thus this point does not apply to them (unless you count apps which
break for characters outside BMP).
> There are other technical differences between these standards that you
> may want to consider that are beyond the scope of this article (such
> as how UTF-16 supports surrogate pairs but UCS-2 does not).
I don't like perpetuating the myth that Unicode is a 16-bit encoding
and UCS-2 can represent all Unicode characters. Yes, in some places you
mention that there are also some characters above the first 64k, but the
general impression from the article is that UCS-2 is one of equally-
functional representations of Unicode, while in fact this is the only
representation which doesn't cover all code points.
-- __("< Marcin Kowalczyk \__/ email@example.com ^^ http://qrnik.knm.org.pl/~qrczak/
This archive was generated by hypermail 2.1.5 : Fri Aug 06 2004 - 12:07:37 CDT