Re: How do I type unicode characters?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Apr 11 2006 - 19:52:24 CST

  • Next message: Philippe Verdy: "Re: Decomposed vs Composed accented characters"

    From: <tom.kirkpatrick@virusbtn.com>
    > Mike,
    >
    > thanks, that really helped clear a few thing up for me and with this
    > knowledge I have just found the source of (one of) my problems - which is
    > that the data I was trying to enter into the database had been saved in
    > ANSI format. I have resaved the sql script and reimported into my database
    > (MySQL) which appears to have cured the random '?' problem...
    >
    > However, although that é (egrave) which was previously displaying as '?'
    > is now displaying correctly directly on my webpage, when I try to show it
    > in a web form element (a drop down menu), it now displays as a é ! So
    > somewhere along the lines it must be being converted back to ISO 8859-1
    > right? My web browser knows that the page is in Unicode. I think there is
    > a possibly the the code that is being used to generate these form elements
    > may be doing this. If it's not the form generation code, then... well it
    > must be, as this is the only thing that is different from displaying
    > normally on the page.
    >
    >> The storage fields are of type N* (e.g. NVARCHAR), correct?
    >
    > It's a MySQL database (although an old one - v 4.0.26) and the storage
    > fields are of type VARCHAR. As far as I know this version of MySQL doesn't
    > have much support for encodings, and I'm not sure what encoding it is
    > currently set to, but I assume that if I enter data as UTF-8 then it will
    > be stored as UFT-8 right? If not, then I need to set the database to store
    > things as Unicode somehow right?

    It's not enough! The database must know how the stored texts is ordered (for the SQL clause ORDER BY) and compared (for case insensitive searches). It can do that only provided that it knows how characters are mapped to the sequznces of bytes it stores.
    So the database (or table) is *created* with a initial encoding that cannot be changedandissupposed uniform in each table.

    In addition, the *client* must indicate to the server how its characters are encoded. If the server does not provide the necessary conversion, then the client must be setup to use the same encoding scheme or charset as the server.

    If the server provides a conversion service, then it will return to the client the data it requested, by first converting them to the encoding scheme specified by the client.

    So the server must be configured first when the database is created. And the client must beconfigured to use either the same charset as the server (so the client performs the conversion) or the client must specify to the server the charset it uses (so the server can perform the conversion, this is generally the best solution to have the server check the charsets used by the client and to perform the conversion if necessary, on the textual data).

    The SQL "NVARCHAR" datatype does not implicitly indicate the charset used by the server, it is just a way to specify the storage space that will be used. A QLEngine may choose to implement NVARCHAR using 16-bit codeunits, or 8-bit codeunits, or this may depend on the setting of the database. The NVARCHAR is normally not necessary, except to allow an optimization in access times (notably in indexes and sorting) when the support of the full Unicode charset is not needed (so CHAR and VARCHAR are typically used for reference keys, that use a limited subset of characters such as ASCII letters or digits, when these keys are generated by program or come from a strict nomenclature of codes; the (VAR)CHAR then does not represent arbitrary text, and NVARCHAR forcesthe server to adopt a Unicode compatible behavior notably in collation and case transformations).



    This archive was generated by hypermail 2.1.5 : Tue Apr 11 2006 - 19:54:00 CST