Re: Microsoft Unicode Article Review

From: Marcin 'Qrczak' Kowalczyk (
Date: Fri Aug 06 2004 - 12:04:48 CDT

  • Next message: Hannes Mayer: "sign for anti-neutrino - greek nu with diacritical line above workaround ?"

    W li¶cie z czw, 05-08-2004, godz. 15:52 -0500, John Tisdale napisał:

    > Yet, if you are working with an application that must parse and
    > manipulate text at the byte-level, the costliness of variable length
    > encoding will probably outweigh the benefits of ASCII compatibility.
    > In such a case the fixed length of UCS-2 will usually prove the better
    > choice. This is why Windows NT and subsequent Microsoft operating
    > systems, SQL Server 7 (and subsequent ones), XML, Java, COM, ODBC,
    > OLEDB and the .NET framework are all built on UCS-2 Unicode encoding.

    At least some of them use UTF-16, not UCS-2, e.g. Java 1.5. I wonder
    if not most of them actually. At least in theory.

    > The uniform length of UCS provides a good foundation when it comes to
    > complex data manipulation.

    And thus this point does not apply to them (unless you count apps which
    break for characters outside BMP).

    > There are other technical differences between these standards that you
    > may want to consider that are beyond the scope of this article (such
    > as how UTF-16 supports surrogate pairs but UCS-2 does not).

    I don't like perpetuating the myth that Unicode is a 16-bit encoding
    and UCS-2 can represent all Unicode characters. Yes, in some places you
    mention that there are also some characters above the first 64k, but the
    general impression from the article is that UCS-2 is one of equally-
    functional representations of Unicode, while in fact this is the only
    representation which doesn't cover all code points.

       __("<         Marcin Kowalczyk

    This archive was generated by hypermail 2.1.5 : Fri Aug 06 2004 - 12:07:37 CDT