RE: Fwd: Wired 4.09 p. 130: Lost in Translation

From: Keld J|rn Simonsen (keld@dkuug.dk)
Date: Wed Aug 28 1996 - 22:09:06 EDT


unicode@Unicode.ORG writes:

> Pls note that UTF-8 is a 31-bit standard (not just a 24-bit standard),
> so it offers a variable-length byte encoding of all 10646 characters let
> alone all Unicode characters.

Well, to be more exact, UTF-8 is a transformation format that
can encode all of ISO/IEC 10646 characters, ISO/IEC 10646 can
encode 2 Gb characters (2**31 characters). UTF-8 will transform
a 10646 character to a sequence of between 1 and *6* octets.

See http://www.dkuug.dk/JTC1/SC2/WG2/docs/N1335 for further reference.

keld



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT