Re: UTF-16 inside UTF-8

From: YTang0648@aol.com
Date: Wed Nov 05 2003 - 14:16:54 EST

Next message: YTang0648@aol.com: "Re: UTF8 and COntrol Characters"

Previous message: Language Analysis Systems, Inc. Unicode list reader: "RE: ZWJ/ZWNJ in combining mark sequences"
Maybe in reply to: Jill Ramonsky: "UTF-16 inside UTF-8"
Next in thread: Peter Kirk: "Re: UTF-16 inside UTF-8"
Reply: Peter Kirk: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

In a message dated 11/5/2003 3:55:46 AM Pacific Standard Time,
peterkirk@qaya.org writes:
Agreed. But to be fair to MySQL, they do mention as a potential problem
that three bytes have to be allocated in strings for each UTF-8
character. For full UTF-8 support they would need four bytes per
character which would, from their perspective, be a greater problem.
Also I suspect that Unicode data is actually being stored in 16-bit
entities, and that the major issue is the extra complication of handling
surrogate pairs within that representation (rather than the trivial one
of converting such pairs to and from valid UTF-8).
I don't think this is an unique issue for MySQL about how to store the
Unicode data, right? Basically, they have the followin choice:

UCS2 - as they are today as you describe
UTF-16 - that is what I think they should do but that might create issue for
the "index" or substring operation
UTF-8
UCS4 or UTF-32 - that is what they think they may need if they support
surrogate.

Mozilla use UTF-16 internally. glib use UCS4 as I understand for w_char in
their "vendor definitation". MS use UTF-16 for Win32 api and OLE api (not sure
about the internal since they are not open source). Tcl use UCS2 (and their
converter does not handle surrogate)

This is a generic issue. Why it so special with MySQL? because the SQL api?

==================================
Frank Yung-Fong Tang
System Architect, Iñtërnâtiônàl Dèvélôpmeñt, AOL Intèrâçtívë Sërviçes
AIM:yungfongta mailto:ytang0648@aol.com Tel:650-937-2913
Yahoo! Msg: frankyungfongtan

John 3:16 "For God so loved the world that he gave his one and only Son, that
whoever believes in him shall not perish but have eternal life.

Does your software display Thai language text correctly for Thailand users?
-> Basic Conceptof Thai Language linked from Frank Tang's
Iñtërnâtiônàlizætiøn Secrets
Want to translate your English text to something Thailand users can
understand ?
-> Try English-to-Thai machine translation at
http://c3po.links.nectec.or.th/parsit/

Next message: YTang0648@aol.com: "Re: UTF8 and COntrol Characters"
Previous message: Language Analysis Systems, Inc. Unicode list reader: "RE: ZWJ/ZWNJ in combining mark sequences"
Maybe in reply to: Jill Ramonsky: "UTF-16 inside UTF-8"
Next in thread: Peter Kirk: "Re: UTF-16 inside UTF-8"
Reply: Peter Kirk: "Re: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Nov 05 2003 - 15:12:27 EST