Re: UTF-16 inside UTF-8

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Nov 05 2003 - 00:49:11 EST

Next message: Abdij Bhat: "RE: UTF8 and COntrol Characters"

Previous message: Abdij Bhat: "UTF8 and COntrol Characters"
In reply to: Peter Kirk: "Re: UTF-16 inside UTF-8"
Next in thread: Jungshik Shin: "Re: UTF-16 inside UTF-8"
Reply: Jungshik Shin: "Re: UTF-16 inside UTF-8"
Reply: Peter Kirk: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Kirk <peterkirk at qaya dot org> wrote:

>> ... (a very old, legacy application, unaware of the existence of
>> codepoints above U+FFFF) ...
>
> Such applications are not "very old", they are still being written.
> For example (see http://www.mysql.com/doc/en/Charset-Unicode.html),
> MySQL 4.1 adds UCS-2 and UTF-8 support to previous versions but for
> single two-byte codes in UCS-2 and up to three bytes per UTF-8
> character only :-( - and this is still in alpha!

At the risk of upsetting the open-source faithful, that is just plain
lazy. Anyone who can master the wizardly details of building a powerful
(and commercially successful) database program can figure out how to
slap two surrogates together without destroying performance.
Constraining UTF-8 to the BMP is even less defensible, since there is no
performance penalty in allowing four-byte UTF-8 sequences.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Abdij Bhat: "RE: UTF8 and COntrol Characters"
Previous message: Abdij Bhat: "UTF8 and COntrol Characters"
In reply to: Peter Kirk: "Re: UTF-16 inside UTF-8"
Next in thread: Jungshik Shin: "Re: UTF-16 inside UTF-8"
Reply: Jungshik Shin: "Re: UTF-16 inside UTF-8"
Reply: Peter Kirk: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 01:50:10 EST