Re: utf-8 and databases

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Jul 08 2002 - 02:58:07 EDT


At 02:11 PM 7/7/02 +0700, Paul Hastings wrote:
>is there a standard test that can determine whether a given
>database can handle utf-8 (ie as "native" utf-8 not converting
>to ucs-2 or whatever)?

Why is that of any interest?

The primary concern is whether a database is able to represent the entire
repertoire of Unicode. Just create a string that contains the largest
character 0x10FFFD, convert it to whatever encoding form the APIs require
and see whether you get it back unmolested.

A more sophisticated test would take a longer string and attempt to sniff
out incorrect truncation of characters.

A secondary concern is performance. If the choice of encoding form is a
poor match for the actual data encountered, and if entering and retrieving
the data requires too many transcoding steps, it's conceivable that this
could be detected in the overall performance of the database.

However, there's no reason to assume that a theoretical match in encoding
efficiency translates automatically into a more efficient database
implementation.
Therefore, regular benchmarking tools should be fine to determine database
performance, as long as the test data is representative for the installation.

A./



This archive was generated by hypermail 2.1.2 : Mon Jul 08 2002 - 01:05:22 EDT