ISO vs Unicode UTF-8 (was RE: UTF-8 signature in web and email)

From: Peter_Constable@sil.org
Date: Fri May 25 2001 - 11:28:55 EDT


On 05/25/2001 02:13:36 AM Bill Kurmey wrote:

>Are there not 2 versions of UTF-8, the Unicode Standard (maximum of 4
>octets) and the ISO/IEC Annex/Amendment to 10646 (maximum of 6 octets)?

The distinction between the Unicode and ISO versions of UTF-8 is pretty
irrelevant. ISO UTF-8 allows a maximum of 6 octets because it is designed
to accommodate a larger codespace than Unicode, but the portion of the
codespace beyond U+10FFFF is now permanently reserved. For all practical
purposes, the usable ISO codespace is the same as that for Unicode, and
thus the usable ISO UTF-8 sequences are at most 4.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT