From: Doug Ewell (dewell@adelphia.net)
Date: Wed Nov 05 2003 - 00:49:11 EST
Peter Kirk <peterkirk at qaya dot org> wrote:
>> ... (a very old, legacy application, unaware of the existence of
>> codepoints above U+FFFF) ...
>
> Such applications are not "very old", they are still being written.
> For example (see http://www.mysql.com/doc/en/Charset-Unicode.html),
> MySQL 4.1 adds UCS-2 and UTF-8 support to previous versions but for
> single two-byte codes in UCS-2 and up to three bytes per UTF-8
> character only :-( - and this is still in alpha!
At the risk of upsetting the open-source faithful, that is just plain
lazy. Anyone who can master the wizardly details of building a powerful
(and commercially successful) database program can figure out how to
slap two surrogates together without destroying performance.
Constraining UTF-8 to the BMP is even less defensible, since there is no
performance penalty in allowing four-byte UTF-8 sequences.
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 01:50:10 EST