RE: Subject: Re: 32'nd bit & UTF-8

From: Martin Duerst (
Date: Mon Jan 24 2005 - 02:07:54 CST

  • Next message: Martin Duerst: "RE: Subject: Re: 32'nd bit & UTF-8"

    At 13:24 05/01/20, Peter Constable wrote:

    >If anyone ever assumed UTF-8 is compatible with ASCII, they were
    >mistaken. An ASCII processor can expect to receive octets strictly in
    >the range 0 - 127, period, whereas clearly UTF-8 data can contain octets
    >outside that range.
    >ASCII is forward compatible with UTF-8 (a UTF-8 processor can process
    >ASCII data), not the other way around.

    Well, there is more than just that. There is a large class
    of programs and tools out there that process 8-bit data, but
    look only at the ASCII values. Such tools work with a lot
    of encodings, starting with iso-8859-1, but not with some
    others such as Shift_JIS. The subset of UTF-8 without a BOM
    works with such tools, but with a BOM, it doesn't.

    Another point, as already mentioned, is that encoding US-ASCII
    as UTF-8 is still US-ASCII if there is no BOM, but no longer
    US-ASCII if there is a BOM.

    Regards, Martin.

    This archive was generated by hypermail 2.1.5 : Mon Jan 24 2005 - 19:27:30 CST