Re: <META tag (was Re: GSM and Unicode)

From: YTang0648@aol.com
Date: Tue Nov 04 2003 - 18:05:23 EST

  • Next message: Jungshik Shin: "Re: charset=utf8 and Mac mailers"

    In a message dated 11/4/2003 2:31:00 PM Pacific Standard Time, JD@BD8.COM
    writes:
    At 5:18 pm -0500 4/11/03, YTang0648@aol.com wrote:

    > According to the HTML standard (see
    >
    <http://www.w3.org/TR/html4/struct/global.html#h-7.4.4>http://www.w3.org/TR/html4/struct/global.html#h-7.4.4 )
    > the right way to specify the charset in html is to use the
    > http-equiv attribute in META tag with a value "Content-Type" and
    > put the charset value after the "text/html; charset=" in the value
    > of the content attribute. The HTML specification does not specify
    > the order between http-equiv and content attribute
    I believe this part is still true.
    > nither does it
    > prohibit other attribute (such as charset=UTF-8 ) to be place.
    I think this part I was wrong.

    Having a charset=UTF-8 in the <META element will make it an "invalid html
    document".
    The interesting part is the following in html 4.0.1
    http://www.w3.org/TR/html401/appendix/notes.html#h-B.1

    [begin of the quote]
    B.1 Notes on invalid documents
    This specification does not define how conforming user agents handle general
    error conditions, including how user agents behave when they encounter
    elements, attributes, attribute values, or entities not specified in this document.
    However, to facilitate experimentation and interoperability between
    implementations of various versions of HTML, we recommend the following behavior:
    If a user agent encounters an element it does not recognize, it should try to
    render the element's content.
    If a user agent encounters an attribute it does not recognize, it should
    ignore the entire attribute specification (i.e., the attribute and its value).
    If a user agent encounters an attribute value it doesn't recognize, it should
    use the default attribute value.
    If it encounters an undeclared entity, the entity should be treated as
    character data.
    We also recommend that user agents provide support for notifying the user of
    such errors.
    Since user agents may vary in how they handle error conditions, authors and
    users must not rely on specific error recovery behavior.
    [end of quote]
    So... such html document is an invalid document, and the HTML user agents are
    recommended to ignore the "charset=", but also are recommended to report to
    the user about such error.

    Well then, have it from the horse's mouth:
    <http://validator.w3.org/>

    Below are the results of attempting to parse this document with an SGML
    parser.

    Line 4, column 14 :there is no attribute "CHARSET" (explain... ).

    <META charset=UTF-8 http-equiv=Content-Type content="text/html;
    charset=utf-8">

    ==================================
    Frank Yung-Fong Tang
    System Architect, Itrntinl Dvlpmet, AOL Intrtv Srvies
    AIM:yungfongta mailto:ytang0648@aol.com Tel:650-937-2913
    Yahoo! Msg: frankyungfongtan

    John 3:16 "For God so loved the world that he gave his one and only Son, that
    whoever believes in him shall not perish but have eternal life.

    Does your software display Thai language text correctly for Thailand users?
    -> Basic Conceptof Thai Language linked from Frank Tang's
    Itrntinliztin Secrets
    Want to translate your English text to something Thailand users can
    understand ?
    -> Try English-to-Thai machine translation at
    http://c3po.links.nectec.or.th/parsit/



    This archive was generated by hypermail 2.1.5 : Tue Nov 04 2003 - 18:49:29 EST