From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Wed Jan 09 2008 - 12:23:08 CST
Hello Damon,
you wrote:
> when I type Unicode characters and then go to
> look at the source I see nothing but gobbledy gook hodge podge of odd
> ASCII characters or Character pairs/groups.
You are probably looking at the source assuming a wrong encoding.
On account of your description, I guess that you are looking on an
UTF-8-encoded source file as if it were ISO-8859-1-encoded.
> What is happening and why/how is the Unicode being recoded or displayed
> in non-unicode format in the source?
Probably, there is nothing non-unicodish in your data:
UTF-8 is one of the three official binary encodings of Unicode,
cf. <http://www.unicode.org/faq/utf_bom.html>.
> Is there a proper source editor
> that will display the actual Unicode encodings?
Try <http://www.alanwood.net/unicode/utilities.html>, for a starter.
> Then there's OpenOffice... I have had to actually submit a bug to OOo
> because when I use it to read directly from my database which is storing
> correctly escaped HTML unicode it converts all of my ampersand escape
> characters to & so ỡ becomes &7905. That one just baffles
> me, as they are supposed to be supporting Unicode, but convert my
> Unicode and then don't even convert it to Unicode but use & instead.
If you want to display an ampersand sign in HTML, you must use
«&», «»», or «»» instead,
cf. <http://www.w3.org/TR/html401/sgml/entities.html>. Probably, your
database software thinks that the ampersands are meant literally, and
provides their correct HTM equivalent. I guess, your database
is not correctly configured, or your write, and read, requests
have incompatible parameters, or your data has been corrupted
before it even entered the database (cf. next paragraph).
If you enter your data via a browser, you may hit on yet another
quirk: Most browsers convert into NCRs characters that cannot be
encoded within the current encoding. To avoid this quirk, you will
have to set the encoding of the WWW page to UTF-8 (or to one of the
other UTFs mentioned above), because this encoding comprises all
possible characters contained in all poular encodings. This means:
1. you have to include the following line early in the head of your
HTLM source:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
2. you have to store the HTM source for your form in UTF-8,
3. you have to configure your HTTP server so it will tag the page
as being UTF-8-encoded, when serving it to your browser.
Good luck,
Otto Stolz
This archive was generated by hypermail 2.1.5 : Wed Jan 09 2008 - 12:25:10 CST