RE: Unicode-based Cyrillic-Latin transliteration table

From: Peter_Constable@sil.org
Date: Tue May 29 2001 - 09:39:27 EDT


On 05/29/2001 02:02:36 AM James Williams wrote:

>Can someone please help me understand whether support for double byte is
the
>same as being Unicode compliant.

No.

Any elaboration would be greatly
>appreciated.

Oh, you'd like an exaplanation? :-)

"Double byte" refers to a variety of legacy character set encoding
standards. Unicode is a distinct standard that uses its own encoding forms.
I'm guessing that you are assuming Unicode is a 16-bit encoding standard.
That is a wrong assumption, however. (It was part of the original
philosophy, but was formally no longer true as of Unicode 2.0 in 1996. It
has take a little time for the terminilogy in the printed standard to catch
up, though.)

To see an example of how double byte encodings and Unicode are different,
take a look at
http://www.microsoft.com/globaldev/reference/dbcs/932/932_98.htm. This
shows one page of MS codepage 932, which is MS's Shift-JIS implementation.
This page shows the characters obtained by a double-byte sequence having a
lead byte of 0x98. The chart shows the Unicode scalar value for each
character according to the value of the trailing byte (e.g. <0x98 0x45>
gives U+6AD3). If you look through the chart, you'll see that it's an ad
hoc assortment of Unicode characters from the CJK Ideographs range -- there
is no connection whatsoever between the double-byte sequence and Unicode
(except the trivial fact that every character in cp932 was included in
Unicode).

Unicode has its own character set, and its own encoding forms. You'll find
UTR#17 a useful read (http://www.unicode.org/unicode/reports/tr17/).

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT