RE: PRODUCING and DESCRIBING UTF-8 with and without BOM

From: Joseph Boyle (Boyle@siebel.com)
Date: Mon Nov 04 2002 - 11:40:27 EST

  • Next message: John Cowan: "Re: [OT] Re: `` ", ` '"

    >INCOMING TEXT: Trivial to simply chek. I say (once again) its THERE BYTES.
    If hey are there then there is a BOM. Simple.

    Yes, it's trivial to check. What's missing is the notation to tell the
    checker what to check for.

    >> The inability to update to one standard all possible consuming
    >> software one might encounter (or for that matter human customers'
    opinions) is precisely
    >> why producing and checking software has to handle both possibilities.
    >But the "both possibilities" are trivial adn its by no means dificult to
    do. Having a good program that refuses to do a little work to handle three
    bytes is like someone who runs a 100 mile marathon and then refuses to cross
    the finish line because the line is yellor instead of white.

    Yes, this is a good description of the sad state of existing software.
    Noting that failure to standardize is irritating and unnecessary doesn't
    make existing software go away.

    -----Original Message-----
    From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
    Sent: Monday, November 04, 2002 8:08 AM
    To: Joseph Boyle; Unicode Mailing List
    Subject: Re: PRODUCING and DESCRIBING UTF-8 with and without BOM

    From: "Joseph Boyle" <Boyle@siebel.com>

    Joesph,

    > Software currently under development could use the identifiers for
    choosing
    > whether to require or emit BOM, like the file requirements checker I
    > have
    to
    > write, and ICU/uconv.

    Lets separate that into the two issuse it represents:

    EMITTING: They could simply choose globally whether to emit the BOM or not.
    If they wanted to get "fancy" they could have a command line option which
    said whether to emit the bytes or not. But that is optional.

    INCOMING TEXT: Trivial to simply chek. I say (once again) its THERE BYTES.
    If hey are there then there is a BOM. Simple.

    > The inability to update to one standard all possible consuming
    > software
    one
    > might encounter (or for that matter human customers' opinions) is
    precisely
    > why producing and checking software has to handle both possibilities.

    But the "both possibilities" are trivial adn its by no means dificult to do.
    Having a good program that refuses to do a little work to handle three bytes
    is like someone who runs a 100 mile marathon and then refuses to cross the
    finish line because the line is yellor instead of white.

    > What would you mean by "the right thing" as far as emitting BOM?
    > Should
    file
    > conversion programs only allow output of non-BOM? (or with-BOM?) Or
    > should they take the specification in an argument separate from the
    > charset name? As said before this unnecessarily requires extra logic.

    Already answered --- they can make a global decision, like notepad or other
    programs do. Especially if the progammer finds the idea of setting it as a
    huge hardship, they can skip that work and simply choose whether they want
    it or not....

    I plead with you -- keep it SIMPLE. :-)

    MichKa



    This archive was generated by hypermail 2.1.5 : Mon Nov 04 2002 - 12:22:02 EST