RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

From: Joseph Boyle (Boyle@siebel.com)
Date: Mon Nov 04 2002 - 11:46:01 EST

  • Next message: Joseph Boyle: "RE: PRODUCING and DESCRIBING UTF-8 with and without BOM"

    I haven't encountered UTF-32, SCSU, UTF-7, or BOCU-1 as transfer encodings.
    If so, they potentially have the same BOM/signature question, unless all
    uses are established as BOM or agnostic, or non-BOM and agnostic. I do not
    expect it to come up much as the formats/protocols that insist on non-BOM
    generally also insist on UTF-8 and/or ASCII compatibility, and because the
    newer encodings are only likely to be implemented by new software.

    -----Original Message-----
    From: Doug Ewell [mailto:dewell@adelphia.net]
    Sent: Monday, November 04, 2002 8:34 AM
    To: Unicode Mailing List
    Cc: Joseph Boyle; 'Michael (michka) Kaplan'
    Subject: Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

    Joseph Boyle <Boyle at siebel dot com> wrote:

    > Software currently under development could use the identifiers for
    > choosing whether to require or emit BOM, like the file requirements
    > checker I have to write, and ICU/uconv.

    Alternatively, software could use a completely separate flag to indicate
    whether a BOM is to be written or not. That is what SC UniPad does, for
    instance. Any type of Unicode file -- UTF-32, UTF-16, UTF-8, SCSU, even
    UTF-7 -- can have a BOM or not.

    Encoding identifiers that have been overloaded to denote the presence or
    absence of BOM, such as "UTF-16" to indicate there is a BOM and "UTF-16LE"
    or "-BE" to indicate there isn't, are often misused and may not be as useful
    as you think.

    -Doug Ewell
     Fullerton, California



    This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 12:17:34 EST