Re: Conformance (was UTF, BOM, etc)

From: Peter Kirk (
Date: Sat Jan 22 2005 - 19:01:28 CST

  • Next message: Jon Hanna: "RE: I Heart Huckabees"

    On 22/01/2005 18:41, Lars Kristan wrote:

    > Peter Kirk wrote:
    > > This is interesting speculation. But with any code page there
    > > are bytes
    > > or combinations of bytes which are illegal or undefined in that code
    > > page.
    > In most SBCS encodings, there are none. Those that are, typically do
    > not occurr. ...

    What do you mean? Of course there are invalid bytes in many legacy
    encodings including Windows-1252. Of course these do not occur in
    properly encoded text. If they are found in such text, the text is
    garbage or has been mis-labelled.

    > ...
    > > And if, speculatively, Windows were to support UTF-8 as a
    > > code page, the
    > > situation would be unchanged. Byte sequences which are
    > > illegal UTF-8 are
    > > garbage in that code page and so would correctly be replaced
    > > by U+FFFD.
    > Which is exactly what needs to be changed. 128 codepoints, remember?
    No. Garbage is garbage. Stop rifling around in other people's garbage.

    > ...
    > Microsoft can provide all UTF-16 applications. But the console can
    > only be improved by using UTF-8. This is the only solution that also
    > works with existing applications.
    Microsoft is not interested in console applications. Elsewhere you wrote:

    > It is Windows that gives me problems now. Customers want Unicode
    > output in console. Why doesn't Windows support UTF-8 locale? Not that
    > I'm being pesky about it, UTF-16 would also be fine. As long as I can
    > get Unicode through stdout. Well, and of course be able to feed it to
    > some other application.

    You can probably find some third party application which can simulate a
    Unix console with UTF-8 support on top of Windows, and that should meet
    your customers' needs. But don't expect Microsoft to support such things
    at the system level. Windows is a GUI system, and the only built-in
    console is for partial back-compatibility with DOS which has no Unicode

    Peter Kirk (personal) (work)
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.300 / Virus Database: 265.7.2 - Release Date: 21/01/2005

    This archive was generated by hypermail 2.1.5 : Sat Jan 22 2005 - 19:41:49 CST