RE: How is UTF8, UTF16 and UTF32 encoded?

From: Rick Cameron (
Date: Thu May 30 2002 - 14:33:58 EDT

Before anyone flames me, let me report that I just discovered that the
online index has links from its entries to the appropriate spot in the other
PDF files.

- rick

-----Original Message-----
From: Rick Cameron
Sent: Thursday, 30 May 2002 11:31
To: 'Theodore H. Smith'; Ecartis
Subject: RE: How is UTF8, UTF16 and UTF32 encoded?

The Unicode Standard 2.0 had a table in Appendix A that is, I think, just
what you're asking for. I can't find this table in the online version of TUS
3.0 (it's not very useful that the online index gives page numbers, when
there's no way to map a page number to the appropriate chapter!)

Does anyone know whether this table (A-3 on page A-7) is available online

- rick

-----Original Message-----
From: Theodore H. Smith []
Sent: Thursday, 30 May 2002 5:05
To: Ecartis
Subject: RE: How is UTF8, UTF16 and UTF32 encoded?

> Many of the explanations of UTF-8 discuss encoding of code points on
> Code Planes 1-16 using the intermediate concept of surrogates as in
> UTF-16. I
> believe that this is both unnecessary and misleading, as UTF-8 is
> fundamentally a direct 21-bit encoding scheme, as may be seen in the
> attached document. So, I believe that the concept of surrogates is not
> relevant for UTF-8 encoding on Code Planes above the BMP.
> This is a slightly different explanation of how UTF-8 works, written
> by me for the Ultracode(r) bar code spec (Ultracode encodes all of
> Unicode 3 directly). If any Unicodotti find any errors in it... please
> let me know!

You sent me a file that explains things, but its in word format
(I think,
its .doc) and I don't have MS Word. I have very few MS things
Just MSIE is all.

Thanks anyhow. This whole bit encoding is kind of technical, and I guess I
could do my own calculations and stuff to get some kind of
feel for what
the conversion code does to a character, but I was hoping more for some
illustrative examples. Like, lets say we take character XX, and so first we
see how many trailing chars it has like this, and etc giving a step by step
example... Almost like code but with the intermediate values listed and

(Once again I almost sent this to ecartis)

     Theodore H. Smith - Macintosh Consultant / Contractor.
     My website: <>

This archive was generated by hypermail 2.1.2 : Thu May 30 2002 - 12:48:09 EDT