From: Mike Ayers (email@example.com)
Date: Tue Apr 11 2006 - 16:12:47 CST
> Which one of these looks like a proper UTF-8 character: é or Ã© ?
Neither. There is no such thing as a "UTF-8 character", just "UTF-8
encoded Unicode data". In most cases I would be nitpicking to point
this out, but in this case I think it is the cause of your problem:
Characters: é Ã©
Unicode code points: 233 195 169
Unicode hex points: E9 C3 A9
It is interesting to note that C3 A9 is the UTF-8 encoding of E9.
> Basically, if I enter the character 'é' (egrave) into my database, when
> trying to display it on a webpage, it displays as a '?'. If I try to enter
> it as 'Ã©' It displays ok. So does this mean that the correct way to type
> an 'é' is to actually type 'Ã©'?
No. It means that you should not handle text as binary. What you are
doing is entering ISO 8859-1 characters (bytes) from one end, then
interpreting the same stream as UTF-8 encoded Unicode at the other,
which is why you have to enter gobbldeygook in order to get the result
My guess is that your database is in ISO 8859-1 format, and your web
page declares UTF-8 (there are many ways to get this particular error,
so I guess). What you need to do is verify that your data is being
extracted from the database as UTF-8 data. The storage fields are of
type N* (e.g. NVARCHAR), correct?
This archive was generated by hypermail 2.1.5 : Tue Apr 11 2006 - 16:23:06 CST