Re: UTF-8 'BOM'

From: Arcane Jill (
Date: Thu Jan 20 2005 - 05:36:35 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: 32'nd bit & UTF-8"

    I enjoy slagging off Microsoft as much as anyone, but this is really out of
    place here. Microsoft did not invent the BOM. Rather, they correctly
    implemented the Unicode Standard. If the Unicode Standard were different in
    this regard, I'm sure that MS text files would follow suit.

    And ... turning your reasoning around a little here ... BOM-less text files (I
    would not be so crass as to call them "Unix text files") can just as easily
    cause problems on Unicode Conforming platforms, because the encoding is then

    If this forum turns into a "my OS is better than your OS" war, I'm leaving.

    -----Original Message-----
    From: []On
    Behalf Of Hans Aberg
    Sent: 20 January 2005 10:41
    Subject: Re: UTF-8 'BOM'

    Didn't Unicode have a principle to merely provide the characters, but not
    impose requirements on their use? Then the requirement that programs should
    ignore the BOM contradicts that principle. It is then that break that causes
    problems on the UNIX platforms.

    It is much better if the BOM is illegal in UTF-8. It does not prevent MS to
    use it, instead labelling it as a file format marker for MS text files. A
    program that then deals with MS text files must then know about the BOM and
    remove it when and if appropriate. At the same time, it does not cause any
    problems for programs that normally do not handle MS text files but only
    plain text: They are fine as they are. Everyone should be able to be happy.

    In fact, one idea might be to add \xFFFE and \xFFFF as delimiters for file
    format markers. Then programs that do not need such markers need not deal
    with them. Other program can make use of them, or simply remove them at
    will. Such markers could also be used to alter the format within the same

      Hans Aberg

    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 05:37:42 CST