Re: Frequent incorrect guesses by the charset autodetection in IE7

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Sat Aug 19 2006 - 03:43:39 CDT

  • Next message: Otto Stolz: "Re: Frequent incorrect guesses by the charset autodetection in IE7"

    Hello Sinnathurai Srivas,

    you have written:
    > please see the image. I'll appreciate if any one
    > willing to come out and help me document these problems.
    > http://www.araichchi.net/kanini/unicode/fail/u-photoplus-fails.jpg
    > http://www.araichchi.net/kanini/unicode/fail/unicode_status.htm
    >
    > Look for question mark.
    > Look for where the question mark starts, despite the use of Unicode fonts.

    In order to get help on your problems, you should explain what you
    have done and what you have expected. At the moment, we see only an
    image that displays fine on any browser. In this case we would
    need, at least, a link to the WWW page that you have displayed to
    produce that screen-shot.

    However, I have inspected both your HTML code in
    <http://www.araichchi.net/kanini/unicode/fail/unicode_status.htm>
    and the pertinent HTTP headers from your server, and I have found
    no charset declaration. If you present your UTF-encoded pages
    in the same way, they will of course not display correctly.

    To display a page correctly, the browser must, of course, know
    its encoding; and it is the author's and the server's duty to
    inform th browser about it -- be it ISCII, ISO 8859, UTF, or
    whatever. If there is no explicit declaration of the encoding,
    the browser must assume ASCII (ISO 646 IRV) for HTML 3, and
    ISO 8859-1 for HTML 4 (and above). This is what the Internet
    standards say.

    To use Unicode, e. g. the UTF-8 encoding, in a WWW page, you will
    have to:
    - store your HTML source in UTF-8 encoding,
    - insert the following HTML header:
       <meta http-equiv="content-type" content="text/html; charset=utf-8">
    - configure your server to emit the following HTTP header:
       Content-Type: text/xml; charset=utf-8
    And your audience must, of course,
    - have suitable fonts installed, capable of rendering all the
       characters you are using in your page,
    - let their browsers just follow the advice from the HTTP headers,
       and not try a different encoding.

    The HTML meta tag will inform the browser in case the user stores
    that page locally and displays it later (without the aid of your
    server). The other four requirements are essential for the normal
    situation, when a browser displays a remote page.

    In contrast, your
    <http://www.araichchi.net/kanini/unicode/fail/unicode_status.htm>
    has no HTML doctype declaration, so the browser assumes HTML 4.01
    transitional, cf. <http://validator.w3.org/docs/help.html#faq-doctype>.
    The HTTP headers for that page specify
    > Content-Type: text/html
    So the browser must assume "charset=iso-8859-1". If your test
    page that displays as u-photoplus-fails.jpg was served in
    the same way, no browser has ever regarded it as Unicode-encoded.

    I am glad that the discussion now turns away from scolding and finds
    its way to solid technical grounds. I am quite confident that most,
    or all, of your complaints can be solved if you just take the pains
    - to study, and comply with, the necessary standards and rules,
    - and, if you hit on an insurmounbtable problem, to describe it in
       detail.
    A good starting point for your studies is <http://www.unicode.org/faq/>.

    Best wishes,
        Otto Stolz



    This archive was generated by hypermail 2.1.5 : Wed Jul 19 2006 - 03:52:32 CDT