RE: Names for UTF-8 with and without BOM

From: Joseph Boyle (Boyle@siebel.com)
Date: Sat Nov 02 2002 - 13:26:36 EST

  • Next message: Michael \(michka\) Kaplan: "Re: Names for UTF-8 with and without BOM"

    These are listed as examples to demonstrate the idea of a configuration file
    listing encoding constraints. The fact that each constraint is arguable is a
    good reason to make the constraints configurable, and therefore to have
    names to distinguish BOM and non-BOM UTF-8.

    -----Original Message-----
    From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
    Sent: Saturday, November 02, 2002 10:16 AM
    To: Joseph Boyle; Mark Davis; Murray Sargent
    Cc: unicode@unicode.org
    Subject: Re: Names for UTF-8 with and without BOM

    From: "Joseph Boyle" <Boyle@siebel.com>

    > Type Encoding Comment
    > .txt UTF-8BOM We want plain text files to have BOM to distinguish from
    > legacy codepage files

    Not really required, but optional -- the perfomance hit of making sure its
    valid UTF-8 is pretty minor. But people do open some *huge* text files in
    things like notepad....

    > .xml UTF-8N Some XML processors may not cope with BOM

    Maybe they need to upgrade? Since people often edit the files in notepad,
    many files are going to have it. A parser that cannot accept this reality is
    not going to make it very long.

    > .htm UTF-8 We want HTML to be UTF-8 but will not insist on BOM

    Same as text, with the bonus of the possiblity of a higher lever protocol.
    It can still go either way.

    > .rc Codepage Unfortunately compiler insists on these being codepage.

    They can be UTF-16, too (at least on Win32!).

    > .swt ASCII Nonlocalizable internal format, must be ASCII.

    Haven't run across these -- but note that if its not UTF-8 then it does not
    apply....



    This archive was generated by hypermail 2.1.5 : Sat Nov 02 2002 - 13:55:19 EST