UTF8 question

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jul 27 2000 - 15:14:19 EDT


Jeu asked:

> Is the UTF-8 encoding scheme the same irrespective of whether the
> uderlying proccessor is little endian or big endian,

The answer to this part of the question is yes. Since UTF-8 is interpreted
as a sequence of bytes, there is no endian problem as there is for
encoding forms that use 16-bit or 32-bit code units.

> or if the system uses
> ASCII or EBCDIC encoding.

This does make a difference, however. Standard UTF-8 won't survive in
an EBCDIC system, because of the different arrangements of control codes.

See the Unicode Technical Report #16 UTF-EBCDIC for the specification of
a conversion of UTF-8 that will work inside an EBCDIC system.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:06 EDT