RE: Unicode-based Cyrillic-Latin transliteration table

Date: Tue May 29 2001 - 09:39:27 EDT

On 05/29/2001 02:02:36 AM James Williams wrote:

>Can someone please help me understand whether support for double byte is
>same as being Unicode compliant.


Any elaboration would be greatly

Oh, you'd like an exaplanation? :-)

"Double byte" refers to a variety of legacy character set encoding
standards. Unicode is a distinct standard that uses its own encoding forms.
I'm guessing that you are assuming Unicode is a 16-bit encoding standard.
That is a wrong assumption, however. (It was part of the original
philosophy, but was formally no longer true as of Unicode 2.0 in 1996. It
has take a little time for the terminilogy in the printed standard to catch
up, though.)

To see an example of how double byte encodings and Unicode are different,
take a look at This
shows one page of MS codepage 932, which is MS's Shift-JIS implementation.
This page shows the characters obtained by a double-byte sequence having a
lead byte of 0x98. The chart shows the Unicode scalar value for each
character according to the value of the trailing byte (e.g. <0x98 0x45>
gives U+6AD3). If you look through the chart, you'll see that it's an ad
hoc assortment of Unicode characters from the CJK Ideographs range -- there
is no connection whatsoever between the double-byte sequence and Unicode
(except the trivial fact that every character in cp932 was included in

Unicode has its own character set, and its own encoding forms. You'll find
UTR#17 a useful read (

- Peter

Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <>

This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:18:17 EDT