Re: Displaying unicode in browser

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Tue May 06 2003 - 09:13:33 EDT

  • Next message: Theodore H. Smith: "Re: Finite state machines? UTF8: toFold(), normalisation, etc"

    SRIDHARAN Aravind wrote:

    > e.g, For á, the unicode equivalent is \u00e1.
    > And when I display this character on browser what I get is \u00e1

    > instead of á.

    - You have to use one of the HTML forms rather than a Java surrogate.
       Some of these were pointed out by others, I'd like to give th full
       picture.

       Chose from the following options:
       · Encode the character in a suitable transfer encoding ("charset",
         in HTTP lingo). For your example, ISO 8859-1 is suitable (as for
         all characters from U+0000 through U+00FF), cf.
         <http://czyborra.com/charsets/iso8859.html#ISO-8859-1>.
         UTF-8 is suitable for all Unicode characters, cf.
         <http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis-04.txt>.
       · Use a character entity, e. g. "&aacute;" (case sensitive), cf.
         <http://www.w3.org/TR/html401/sgml/entities.html#h-24.1>.
       · Use a Numeric character Reference (NCR),
         cf. <http://www.w3.org/TR/html401/charset.html#h-5.3.1>.
         These come in two flavours:
         - decimal, e. g. "&#225;",
         - hexadecadic, e. g. "&#xe1;" (case insensitive).

    - You have to tell the browser about the encoding you are using,
       cf. <http://www.w3.org/TR/html401/charset.html#h-5.2.2>.

    Examples for these techiques can be found at:
    - <http://www.rz.uni-konstanz.de/Antivirus/tests/Euro-Latin-1.htm>
    - <http://www.rz.uni-konstanz.de/Antivirus/tests/Euro-Latin-9.htm>
    - <http://www.rz.uni-konstanz.de/Antivirus/tests/Euro-UTF.htm>
    - <http://www.rz.uni-konstanz.de/Antivirus/tests/Go-Latin.htm>
    - <http://www.rz.uni-konstanz.de/Antivirus/tests/Go-UTF.htm>
    (For the latter two, you need a CJK font to display all characters).
    Look also at the HTML source of these examples.

    - Note that Netscape 4.7 (and earlier versions) does display both
       character entities and NCRs only if these characters could also
       be encoded in the transfer encoding (charset) used. To be on the
       safe side, you can declare UTF-8 encoding, and type your whole
       HTML source in ASCII, using entities/NCRs for all non-ASCII
       characters.

    Best wishes,
       Otto Stolz



    This archive was generated by hypermail 2.1.5 : Tue May 06 2003 - 10:06:25 EDT