Re: UTF-16 inside UTF-8

Date: Wed Nov 05 2003 - 14:16:54 EST

  • Next message: "Re: UTF8 and COntrol Characters"

    In a message dated 11/5/2003 3:55:46 AM Pacific Standard Time, writes:
    Agreed. But to be fair to MySQL, they do mention as a potential problem
    that three bytes have to be allocated in strings for each UTF-8
    character. For full UTF-8 support they would need four bytes per
    character which would, from their perspective, be a greater problem.
    Also I suspect that Unicode data is actually being stored in 16-bit
    entities, and that the major issue is the extra complication of handling
    surrogate pairs within that representation (rather than the trivial one
    of converting such pairs to and from valid UTF-8).
    I don't think this is an unique issue for MySQL about how to store the
    Unicode data, right? Basically, they have the followin choice:

    UCS2 - as they are today as you describe
    UTF-16 - that is what I think they should do but that might create issue for
    the "index" or substring operation
    UCS4 or UTF-32 - that is what they think they may need if they support

    Mozilla use UTF-16 internally. glib use UCS4 as I understand for w_char in
    their "vendor definitation". MS use UTF-16 for Win32 api and OLE api (not sure
    about the internal since they are not open source). Tcl use UCS2 (and their
    converter does not handle surrogate)

    This is a generic issue. Why it so special with MySQL? because the SQL api?

    Frank Yung-Fong Tang
    System Architect, Itrntinl Dvlpmet, AOL Intrtv Srvies
    AIM:yungfongta Tel:650-937-2913
    Yahoo! Msg: frankyungfongtan

    John 3:16 "For God so loved the world that he gave his one and only Son, that
    whoever believes in him shall not perish but have eternal life.

    Does your software display Thai language text correctly for Thailand users?
    -> Basic Conceptof Thai Language linked from Frank Tang's
    Itrntinliztin Secrets
    Want to translate your English text to something Thailand users can
    understand ?
    -> Try English-to-Thai machine translation at

    This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 15:12:27 EST