RE: UTF-8 'BOM'

From: gpw@uniserve.com
Date: Thu Jan 20 2005 - 13:17:02 CST

  • Next message: John H. Jenkins: "Re: UTF-8 'BOM'"

    Quoting Peter Constable <petercon@microsoft.com>:

    > > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
    > On
    > > Behalf Of gpw@uniserve.com
    >
    >
    > > This is slightly revisionist. Long, long ago there were only
    > > big-endian encoding schemes with the BOM available to help
    > > detect problems. Microsoft insisted on writing datafiles on
    > > Intel platforms in a little-endian format. Once this practice
    > > was entrenched, the standard renamed the old defined practice
    > > as big-endian, documented the little-endian version and created
    > > a third with the BOM at the beginning to let people cope with
    > > finding either.
    >
    > This is a real hoot! Talk about revisionist. Microsoft and other
    > companies started writing datafiles on Intel platforms starting back in
    > -- what was it? 1981? 1975? Certainly earlier than 1983. Unicode 1.0
    > wasn't published until 1990.

    My comments need clarification. I meant that Microsoft insisted
    on writing datafiles containing Unicode data in little-endian
    format. I have no concerns about other data, indeed I was
    writing binary data in little-endian format in the 80s myself.

    But back when TUS 1.0 came out, I read the bit about Public
    Interchange to mean anything outside the confines a program's
    core memory. Whether it was on the wire or written to a file
    that some other program could read, it should be in big-endian
    format. I cheered at this because I had lots of experience
    with inter-platform data exchange and so such a statement
    meant that there would be one fewer worry in dealing with
    multi-byte codepoint representations. And then down the road
    when I heard that Microsoft didn't do it that way I lamented.

    Now it could be my interpretation of The Right Way back then
    was faulty, but I know that my colleagues at the time came
    to the same interpretation of TUS 1.0. I have fuzzy memories
    that a wider circle of people also read it the way I just
    described but I won't lay claim to it. If our learned elders
    care to step forward to confirm or deny my interpretation I
    would be appreciative. Whatever the case I won't beat the
    poor horse anymore.

    Geoffrey



    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 13:17:38 CST