RE: How is UTF8, UTF16 and UTF32 encoded?

From: Theodore H. Smith (delete@softhome.net)
Date: Thu May 30 2002 - 08:04:53 EDT


> Many of the explanations of UTF-8 discuss encoding of code
> points on Code
> Planes 1-16 using the intermediate concept of surrogates as in
> UTF-16. I
> believe that this is both unnecessary and misleading, as UTF-8 is
> fundamentally a direct 21-bit encoding scheme, as may be seen in the
> attached document. So, I believe that the concept of surrogates is not
> relevant for UTF-8 encoding on Code Planes above the BMP.
>
> This is a slightly different explanation of how UTF-8 works,
> written by me
> for the Ultracode(r) bar code spec (Ultracode encodes all of Unicode 3
> directly). If any Unicodotti find any errors in it... please
> let me know!

You sent me a file that explains things, but its in word format
(I think,
its .doc) and I don't have MS Word. I have very few MS things
fortunately.
Just MSIE is all.

Thanks anyhow. This whole bit encoding is kind of technical, and I guess
I could do my own calculations and stuff to get some kind of
feel for what
the conversion code does to a character, but I was hoping more for some
illustrative examples. Like, lets say we take character XX, and so first
we see how many trailing chars it has like this, and etc giving a step
by step example... Almost like code but with the intermediate values
listed and explained.

(Once again I almost sent this to ecartis)

--
     Theodore H. Smith - Macintosh Consultant / Contractor.
     My website: <www.elfdata.com/>



This archive was generated by hypermail 2.1.2 : Thu May 30 2002 - 12:25:43 EDT