Re: roundtrip on UTF8 value 1114048 ?

From: Doug Ewell (dewell@adelphia.net)
Date: Sun Jun 09 2002 - 22:17:06 EDT


Theodore H. Smith <delete at softhome dot net> wrote:

> My code I took from Uniconv.c fails on a roundtrip, converting 1114048
> from UTF32 to UTF8, then back again.
>
> I did modify the code however to make it faster. So can anyone here
> who uses Uniconv.c tell me if a roundtrip on 1114048 works fine?

I don't know what Uniconv.c is either. I know there is a Basis
Technology product called Uniconv, but I'm pretty sure it doesn't come
with source code.

Anyway, decimal 1114048 is hex 10FFC0. (Theodore, please try to use
hexadecimal to refer to Unicode code points. Decimal is not
conventionally used for that purpose and will probably confuse people.)
The UTF-8 bytes corresponding to U+10FFC0 are F4 8F BF 80. So you
should check your UTF-8 encoding code first to ensure it yields the
correct bytes. If the encoding stage is OK, then the problem must lie
in the decoding stage.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Sun Jun 09 2002 - 20:49:24 EDT