Re: Viewing Source...

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Wed Jan 09 2008 - 12:23:08 CST

  • Next message: Otto Stolz: "Re: Viewing Source..."

    Hello Damon,

    you wrote:
    > when I type Unicode characters and then go to
    > look at the source I see nothing but gobbledy gook hodge podge of odd
    > ASCII characters or Character pairs/groups.

    You are probably looking at the source assuming a wrong encoding.
    On account of your description, I guess that you are looking on an
    UTF-8-encoded source file as if it were ISO-8859-1-encoded.

    > What is happening and why/how is the Unicode being recoded or displayed
    > in non-unicode format in the source?

    Probably, there is nothing non-unicodish in your data:
    UTF-8 is one of the three official binary encodings of Unicode,
    cf. <http://www.unicode.org/faq/utf_bom.html>.

    > Is there a proper source editor
    > that will display the actual Unicode encodings?

    Try <http://www.alanwood.net/unicode/utilities.html>, for a starter.

    > Then there's OpenOffice... I have had to actually submit a bug to OOo
    > because when I use it to read directly from my database which is storing
    > correctly escaped HTML unicode it converts all of my ampersand escape
    > characters to &amp; so &#7905; becomes &amp;7905. That one just baffles
    > me, as they are supposed to be supporting Unicode, but convert my
    > Unicode and then don't even convert it to Unicode but use &amp; instead.

    If you want to display an ampersand sign in HTML, you must use
    «&amp;», «&#xBB;», or «&#187;» instead,
    cf. <http://www.w3.org/TR/html401/sgml/entities.html>. Probably, your
    database software thinks that the ampersands are meant literally, and
    provides their correct HTM equivalent. I guess, your database
    is not correctly configured, or your write, and read, requests
    have incompatible parameters, or your data has been corrupted
    before it even entered the database (cf. next paragraph).

    If you enter your data via a browser, you may hit on yet another
    quirk: Most browsers convert into NCRs characters that cannot be
    encoded within the current encoding. To avoid this quirk, you will
    have to set the encoding of the WWW page to UTF-8 (or to one of the
    other UTFs mentioned above), because this encoding comprises all
    possible characters contained in all poular encodings. This means:
    1. you have to include the following line early in the head of your
        HTLM source:
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
    2. you have to store the HTM source for your form in UTF-8,
    3. you have to configure your HTTP server so it will tag the page
        as being UTF-8-encoded, when serving it to your browser.

    Good luck,
       Otto Stolz



    This archive was generated by hypermail 2.1.5 : Wed Jan 09 2008 - 12:25:10 CST