>Unicode is the code, which is based on 16 bit chunks of ether or whatever,
>UTF-8 is a biased transformation format...

That's too simple to capture the current reality, as others have been
indicating. The full story is availble in UTR17, and *everybody* on this
list ought to read and digest it - of all the UTRs, it's probably the one
that's most useful to be read by the broadest audience.

In a nutshell, Unicode started life being 16-bit monowidth, but the need to
extend and merge with ISO 10646 made life more complicated. At this point,
there is no real option but to say that Unicode is a 21 (or 20.1) bit*
character set combined with various encoding forms and schemes based on 8,
16 or 32 bit data types.

* The codespace for the encoded character set takes a little explanation.
The simplification is that it's 0 - 10FFFF (which takes 21 bits to
represent but doesn't go as far as 21 bits would allow - that would be
1FFFFF). Actually, you have to remove from this D800 - DFFF and 34 values
that match the pattern nnnnFE and nnnnFF.

