RE: Nonsense in http://www.unicode.org/Public/PROGRAMS/CVTUTF/CVT UTF.C?

From: Ayers, Mike (Mike_Ayers@bmc.com)
Date: Wed Aug 22 2001 - 21:03:19 EDT


> From: Michael (michka) Kaplan [mailto:michka@trigeminal.com]
> Sent: Wednesday, August 22, 2001 03:59 PM

> From: "Marcin 'Qrczak' Kowalczyk" <qrczak@knm.org.pl>
>
> > Functions ConvertUCS4toUTF8 and ConvertUTF8toUCS4 use surrogates
> > in UCS4. In particular ConvertUTF8toUCS4 converts a character above
> > U+FFFF into two UCS4 words. Why is this absurd there?!
>
> UCS-4 has no knowledge of surrogate code points or their
> significance; it is
> ap urely algorithmic conversion. Not sure why the results would be so
> surprising, given this?

        I know nothing of UCS-4, but if, as the name implies, it uses 4
bytes per word, and needs two of those to represent quantities greater than
0xffff, i.e. 8 bytes to represent a 3 byte quantity, then, yes, I would be
surprised (and as an engineer, disgusted).

/|/|ike



This archive was generated by hypermail 2.1.2 : Wed Aug 22 2001 - 21:56:44 EDT