Re: Software support costs (was: Nicest UTF)

From: Theodore H. Smith (
Date: Fri Dec 10 2004 - 15:07:16 CST

  • Next message: John Cowan: "Re: US-ASCII (was: Re: Invalid UTF-8 sequences)"

    > Philippe,
    >> Also a broken opening tag for HTML/XML documents
    > In addition to not having endian problems UTF-8 is also useful when
    > tracing
    > intersystem communications data because XML and other tags are usually
    > in
    > the ASCII subset of UTF-8 and stand out making it easier to find the
    > specific data you are looking for.

    That was the whole point of my original thread.

    What you say is simply not true. You can process UTF-8 as bytes. Using
    your approach, even UTF16 needs multiple codepoints to be treated as a
    character, because of decomposed characters.

    But with most tasks (but not all), you can treat Unicode as bytes,
    using UTF-8.

    I've done this extensively, and it works just fine.

    The reason I repeat this, is because even people like me (who are able
    to understand) could be confused, if they receive the wrong information
    and none of the right information.

    If someone who was able to understand UTF-8 got both the right and
    wrong information, they'd be able to make up their own mind. But if
    they just got the wrong information, they could be mislead, as I was.

    Which is why I'm repeating that you can treat UTF-8 as bytes, most of
    the time, and it works just perfectly.

        Theodore H. Smith - Software Developer -
        Industrial strength string processing code, made easy.
        (If you believe that's an oxymoron, see for yourself.)

    This archive was generated by hypermail 2.1.5 : Fri Dec 10 2004 - 15:08:57 CST