Re: Thank you for all the good information, sUTF32ToUTF8 function

From: DougEwell2@cs.com
Date: Fri Nov 09 2001 - 00:35:08 EST


In a message dated 2001-11-08 21:09:35 Pacific Standard Time,
Peter_Constable@sil.org writes:

> Anybody willing to check this for me?
>
> CString sUTF32ToUTF8( LONG lUTF32 )

I haven't run it through a compiler, but most of it looks fine. However, the
algorithm would be a lot more transparent if the constants were hex instead
of decimal (e.g. 0x1000 instead of 4096).

Also, I would have written it to use bit shifts instead of divisions and
modulos (IUTF32 >> 12 instead of lUTF32 / 4096).

And I don't think you're supposed to exclude the surrogate code space (0xD800
through 0xDFFF) from normal processing. (This is the "D29 conundrum" -- all
UTFs must support encoding of non-characters, including unpaired surrogates,
even though UTF-16 cannot do this.) The code you provided encodes unpaired
surrogates in four bytes -- by pushing them down to the final "else" -- which
is wrong in any event and almost certainly not what the programmer intended.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Fri Nov 09 2001 - 01:26:21 EST