From: Doug Ewell (firstname.lastname@example.org)
Date: Thu Aug 28 2003 - 23:30:03 EDT
Lars Marius Garshol <larsga at garshol dot priv dot no> quoted Marco
> | It seems that the IT world has a new acronym: "QBCS". I understand
> | that it stands for "quadra-byte character set", and I heard it used
> | to refer to GB 13030.
> | My question is: it just a fancy sinomym for GB 13030 or can it also
> | refer to Unicode or other encodings?
The original term "DBCS," or "double-byte character set," refers to a
variable-width encoding where each character requires either one or two
bytes. East Asian legacy character encodings fall into this category.
By extension, then, a "QBCS" would be a variable-width character
encoding where the code units can be anywhere from one to four bytes
long -- an apt description of GB 18030.
Paradoxically (at least to me), the term "multi-byte character set"
refers to a fixed-width encoding, such as UCS-2. The official name of
ISO/IEC 10646 is "Universal Multiple-Octet Coded Character Set."
(BTW, pet peeve: The word "acronym" should only be used to mean a
pronounceable WORD ("nym") formed from the initials of other words.
Classic examples are "scuba" and "radar." If you can figure out how to
pronounce "qbcs," more power to you, but to me it's just an
> This must be an oxymoron, in the sense that character sets don't
> really have a byte width, being completely abstract assignments of
> abstract characters to abstract numbers.
This is technically true, but the terms SBCS and DBCS are so entrenched
in the industry that it doesn't seem useful to try to deprecate them
> So what it really means must be "quadra-byte character encoding", and
> both GB 18030 and UTF-32 should fit into that category.
GB 18030, yes, because its code units vary from one to four bytes in
length. UTF-32, no, because its code units are uniformly 32 bits.
This archive was generated by hypermail 2.1.5 : Fri Aug 29 2003 - 00:14:35 EDT