Re: How do I type unicode characters?

From: tom.kirkpatrick@virusbtn.com
Date: Tue Apr 11 2006 - 17:38:36 CST

  • Next message: Tay, William: "RE: Decomposed vs Composed accented characters"

    Mike,

    thanks, that really helped clear a few thing up for me and with this
    knowledge I have just found the source of (one of) my problems - which is
    that the data I was trying to enter into the database had been saved in
    ANSI format. I have resaved the sql script and reimported into my database
    (MySQL) which appears to have cured the random '?' problem...

    However, although that (egrave) which was previously displaying as '?'
    is now displaying correctly directly on my webpage, when I try to show it
    in a web form element (a drop down menu), it now displays as a é ! So
    somewhere along the lines it must be being converted back to ISO 8859-1
    right? My web browser knows that the page is in Unicode. I think there is
    a possibly the the code that is being used to generate these form elements
    may be doing this. If it's not the form generation code, then... well it
    must be, as this is the only thing that is different from displaying
    normally on the page.

    > The storage fields are of type N* (e.g. NVARCHAR), correct?

    It's a MySQL database (although an old one - v 4.0.26) and the storage
    fields are of type VARCHAR. As far as I know this version of MySQL doesn't
    have much support for encodings, and I'm not sure what encoding it is
    currently set to, but I assume that if I enter data as UTF-8 then it will
    be stored as UFT-8 right? If not, then I need to set the database to store
    things as Unicode somehow right?

    Mike Ayers <mayers@celequest.com>
    Sent by: unicode-bounce@unicode.org
    11/04/2006 23:12

    To
    tom.kirkpatrick@virusbtn.com
    cc
    unicode@unicode.org
    Subject
    Re: How do I type unicode characters?

    tom.kirkpatrick@virusbtn.com wrote:

    > Which one of these looks like a proper UTF-8 character: or é ?

                     Neither. There is no such thing as a "UTF-8 character",
    just "UTF-8
    encoded Unicode data". In most cases I would be nitpicking to point
    this out, but in this case I think it is the cause of your problem:

    Characters: é
    Unicode code points: 233 195 169
    Unicode hex points: E9 C3 A9

                     It is interesting to note that C3 A9 is the UTF-8
    encoding of E9.

    > Basically, if I enter the character '' (egrave) into my database, when
    > trying to display it on a webpage, it displays as a '?'. If I try to
    enter
    > it as 'é' It displays ok. So does this mean that the correct way to
    type
    > an '' is to actually type 'é'?

                     No. It means that you should not handle text as binary.
    What you are
    doing is entering ISO 8859-1 characters (bytes) from one end, then
    interpreting the same stream as UTF-8 encoded Unicode at the other,
    which is why you have to enter gobbldeygook in order to get the result
    you desire.

                     My guess is that your database is in ISO 8859-1 format,
    and your web
    page declares UTF-8 (there are many ways to get this particular error,
    so I guess). What you need to do is verify that your data is being
    extracted from the database as UTF-8 data. The storage fields are of
    type N* (e.g. NVARCHAR), correct?

                     HTH,

    /|/|ike

    -- 
    Tom Kirkpatrick
    Web Developer - Virus Bulletin
    


    This archive was generated by hypermail 2.1.5 : Tue Apr 11 2006 - 17:40:56 CST