Date: Wed Nov 05 2003 - 14:16:54 EST
In a message dated 11/5/2003 3:55:46 AM Pacific Standard Time,
Agreed. But to be fair to MySQL, they do mention as a potential problem
that three bytes have to be allocated in strings for each UTF-8
character. For full UTF-8 support they would need four bytes per
character which would, from their perspective, be a greater problem.
Also I suspect that Unicode data is actually being stored in 16-bit
entities, and that the major issue is the extra complication of handling
surrogate pairs within that representation (rather than the trivial one
of converting such pairs to and from valid UTF-8).
I don't think this is an unique issue for MySQL about how to store the
Unicode data, right? Basically, they have the followin choice:
UCS2 - as they are today as you describe
UTF-16 - that is what I think they should do but that might create issue for
the "index" or substring operation
UCS4 or UTF-32 - that is what they think they may need if they support
Mozilla use UTF-16 internally. glib use UCS4 as I understand for w_char in
their "vendor definitation". MS use UTF-16 for Win32 api and OLE api (not sure
about the internal since they are not open source). Tcl use UCS2 (and their
converter does not handle surrogate)
This is a generic issue. Why it so special with MySQL? because the SQL api?
Frank Yung-Fong Tang
System Architect, Iñtërnâtiônàl Dèvélôpmeñt, AOL Intèrâçtívë Sërviçes
AIM:yungfongta mailto:firstname.lastname@example.org Tel:650-937-2913
Yahoo! Msg: frankyungfongtan
John 3:16 "For God so loved the world that he gave his one and only Son, that
whoever believes in him shall not perish but have eternal life.
Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
Want to translate your English text to something Thailand users can
-> Try English-to-Thai machine translation at
This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 15:12:27 EST