Using UTF-8 to handle characters in the supplementary planes by way of using
two separate code points in the surrogate range is NOT considered

Currently it is legal to interpret them but *not* to generate them (multople
refs on the Unicode site). Therefore, I hope you are mistaken about the
rumor since this would be a Bad Thing (tm).


> I have heard a rumour (i.e. my source is not involved in the reported
> activity) that:
> <quote>
> SAP, PeopleSoft, Siebel, Oracle and others are actually
> in the process of proposing a new format of UTF that will cause a UTF-16
> surrogate pair to become two 3-byte UTF-8 codepoints so that UTF-8 will
> have the same behaviour as UTF-16, that is, a surrogate will be two UTF-8
> code points.
> </quote>
> Can anyone corroborate this, and, if it's true, offer an opinion on it?
> I may add that, as some of you already know, a small group in the UK
> includes me) is working on a proposal intended to improve the SQL standard
> specification with regard to the treatment of Unicode data by an
> SQL-implementation.
> The competent bodies are ISO/IEC SC 32/WG 3, ANSI NCITS H2, BSI IST/40 and
> other national bodies.
> We expect that most of the parties most interested, principally SQL
> implementors, are already represented either directly or indirectly on one
> or more competent bodies. But if anyone else is interested, please feel
> to download the current, incomplete, provisional draft of the proposal
> where the files containing two different versions are jms01v6 and jms01v7
> each of which is in both w97.doc and .pdf format.
> All comments will be seriously considered.
> Mike Sykes
