Re: UTF-8 'BOM'

From: Hans Aberg (
Date: Thu Jan 20 2005 - 04:40:54 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Subject: Re: 32'nd bit & UTF-8"

    Didn't Unicode have a principle to merely provide the characters, but not
    impose requirements on their use? Then the requirement that programs should
    ignore the BOM contradicts that principle. It is then that break that causes
    problems on the UNIX platforms.

    It is much better if the BOM is illegal in UTF-8. It does not prevent MS to
    use it, instead labelling it as a file format marker for MS text files. A
    program that then deals with MS text files must then know about the BOM and
    remove it when and if appropriate. At the same time, it does not cause any
    problems for programs that normally do not handle MS text files but only
    plain text: They are fine as they are. Everyone should be able to be happy.

    In fact, one idea might be to add \xFFFE and \xFFFF as delimiters for file
    format markers. Then programs that do not need such markers need not deal
    with them. Other program can make use of them, or simply remove them at
    will. Such markers could also be used to alter the format within the same

      Hans Aberg

    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 04:42:03 CST