Re: Subject: Re: 32'nd bit & UTF-8

From: Christopher Fynn (
Date: Thu Jan 20 2005 - 07:00:16 CST

  • Next message: "Re: UTF-8 'BOM'"

    Hans Aberg wrote:
    > On 2005/01/20 02:28, Christopher Fynn at wrote:

    >>>Whereas UTF-16 might have been used widely in some quarters up today, my
    >>>impression is that this is more of a legacy thing, and UTF-8 and UTF-32 will
    >>>eventually become the only modern formats in use. In the past, one
    >>>originally used 16-bits integral types because one thought Unicode would not
    >>>exceed 2^16 numbers. But when it is clear it does not suffice, there is no
    >>>point using it in new software, except for legacy. UTF-32 will be used for
    >>>speed, and UTF-8 for compatibility with ASCII and solving the endian issue.

    >>If you choose Save as "Unicode" in MS applications what do you get? The
    >>"legacy" of all that data being created today in MS Office etc on Windows
    >>machines is going to be around for awhile.

    > One can do as in the C++ standard with its .h headers, decide to keep UTF-16
    > for now as legacy, but indicate that it may be phased out in a later Unicode
    > version. Developers then get X numbers of years to change. It will be easy
    > to make new editors read the old formats but save them in the new formats.
    > Hans Aberg

    Something like 99% of text data uses only BMP characters for which UTF-16
    is pretty efficient. Unless new scripts are adopted for modern languages, we
    all start using Egyptian Hieroglyphics or China creates thousands of new
    ideographic characters and makes their use mandatory in place of existing
    characters, this situation seems unlikely to change.

    Didn't MS natively support Unicode (/UCS-2) with the first version of
    Windows NT - before UTF-8 came along - and chose a 16-bit form because
    that's was what Unicode was at the time NT was developed?

    Doesn't MAC OSX use UTF-16 for most of it's native APIs - except for stuff
    that calls BSD system routines?

    - Chris

    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 07:01:12 CST