Re: Understanding UTF-8 and ISO correlation's.

From: schererm@us.ibm.com
Date: Tue May 25 1999 - 14:01:42 EDT


unicode/iso-10646-1 cover the _repertoire_ of all of the iso-8859-*'s as far as
they were published until recently. i am not sure about the newest 8859 parts.
covering the repertoire means that all the characters are there - and more, many
more.
moreover, the first 256 codes in unicode are the same numbers/values as the
codes in iso-8859-1.
naturally, this is not true for all 256 codes for any other iso-8859-*, i.e.,
some characters have different numbers.

utf-8 is one of several encodings for unicode/iso-10646. it defines how to get
31-bit codes into bytes and uses more than one byte for non-ascii characters (2
bytes for the extended characters in the iso-8859-*).

have a look at http://www.unicode.org and at http://www.dkuug.dk/jtc1/sc2/wg2/

markus

Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
schererm@us.ibm.com
                        Unicode is here! --> http://www.unicode.org/

"James Liptak" <jamie@intermind.com> on 99-05-25 12:52:09

To: Unicode List <unicode@unicode.org>
cc: (bcc: Markus Scherer/Raleigh/Contr/IBM)
Subject: Understanding UTF-8 and ISO correlation's.

Hello,

I am James Liptak and trying to understand the correlation between UTF-8 and
ISO standards. From what I can see we have ISO -10646 is the mapping but
then goes on to saying ISO-8859 -1 (Latin- 1) not the Extended Latin
8859 -2.

My question is: Does UTF-8 use all of ISO - 8859 -(1-9) or is it specific
and only handles specific parts of ISO - 8859 which according the
publication V2.0 of Unicode standard is vague.

James Liptak
Jamie@intermind.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT