In a message dated 2001-11-08 21:09:35 Pacific Standard Time,
Peter_Constable@sil.org writes:
> Anybody willing to check this for me?
>
> CString sUTF32ToUTF8( LONG lUTF32 )
I haven't run it through a compiler, but most of it looks fine. However, the
algorithm would be a lot more transparent if the constants were hex instead
of decimal (e.g. 0x1000 instead of 4096).
Also, I would have written it to use bit shifts instead of divisions and
modulos (IUTF32 >> 12 instead of lUTF32 / 4096).
And I don't think you're supposed to exclude the surrogate code space (0xD800
through 0xDFFF) from normal processing. (This is the "D29 conundrum" -- all
UTFs must support encoding of non-characters, including unpaired surrogates,
even though UTF-16 cannot do this.) The code you provided encodes unpaired
surrogates in four bytes -- by pushing them down to the final "else" -- which
is wrong in any event and almost certainly not what the programmer intended.
-Doug Ewell
Fullerton, California
This archive was generated by hypermail 2.1.2 : Fri Nov 09 2001 - 01:26:21 EST