Re: Understanding UTF-8 and ISO correlation's.

From: A. Vine (avine@eng.sun.com)
Date: Tue May 25 1999 - 14:00:28 EDT


Hello James,

James Liptak wrote:
>
> Hello,
>
> I am James Liptak and trying to understand the correlation between UTF-8 and
> ISO standards. From what I can see we have ISO -10646 is the mapping but
> then goes on to saying ISO-8859 -1 (Latin- 1) not the Extended Latin
> 8859 -2.

To clarify, UTF-8 is an encoding of Unicode described in RFC 2279 (an IETF
standard, freely available on the Internet). ISO-8859-1 is a value subset of
Unicode - that is, if you look at the value corresponding with each character,
the values 0-255 correspond to the same characters in Unicode. This is not true
for ISO-8859-2. However, for some of the ISO-8859-x charsets, the chars
associated with values 128-255 correspond directly to an entire section of
Unicode, meaning you only have to add a fixed offset value to get the
corresponding Unicode value.

Note that values 0-127 of all the ISO-8859 charsets correspond to the same
characters in every instance (US-ASCII), only values 128-255 are re-mapped.

>
> My question is: Does UTF-8 use all of ISO - 8859 -(1-9) or is it specific
> and only handles specific parts of ISO - 8859 which according the
> publication V2.0 of Unicode standard is vague.
>

I hope this helps,
Andrea

-- 
Andrea Vine
Sun Internet Mail Server i18n architect
avine@eng.sun.com
Romanes eunt domus.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT