Re: UTF-16 inside UTF-8

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Nov 05 2003 - 00:49:11 EST

  • Next message: Abdij Bhat: "RE: UTF8 and COntrol Characters"

    Peter Kirk <peterkirk at qaya dot org> wrote:

    >> ... (a very old, legacy application, unaware of the existence of
    >> codepoints above U+FFFF) ...
    >
    > Such applications are not "very old", they are still being written.
    > For example (see http://www.mysql.com/doc/en/Charset-Unicode.html),
    > MySQL 4.1 adds UCS-2 and UTF-8 support to previous versions but for
    > single two-byte codes in UCS-2 and up to three bytes per UTF-8
    > character only :-( - and this is still in alpha!

    At the risk of upsetting the open-source faithful, that is just plain
    lazy. Anyone who can master the wizardly details of building a powerful
    (and commercially successful) database program can figure out how to
    slap two surrogates together without destroying performance.
    Constraining UTF-8 to the BMP is even less defensible, since there is no
    performance penalty in allowing four-byte UTF-8 sequences.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 01:50:10 EST