Re: [OT?] QBCS

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Aug 28 2003 - 23:30:03 EDT

  • Next message: Marco Cimarosti: "RE: [OT?] QBCS"

    Lars Marius Garshol <larsga at garshol dot priv dot no> quoted Marco
    Cimarosti:

    > | It seems that the IT world has a new acronym: "QBCS". I understand
    > | that it stands for "quadra-byte character set", and I heard it used
    > | to refer to GB 13030.
    > |
    > | My question is: it just a fancy sinomym for GB 13030 or can it also
    > | refer to Unicode or other encodings?

    The original term "DBCS," or "double-byte character set," refers to a
    variable-width encoding where each character requires either one or two
    bytes. East Asian legacy character encodings fall into this category.

    By extension, then, a "QBCS" would be a variable-width character
    encoding where the code units can be anywhere from one to four bytes
    long -- an apt description of GB 18030.

    Paradoxically (at least to me), the term "multi-byte character set"
    refers to a fixed-width encoding, such as UCS-2. The official name of
    ISO/IEC 10646 is "Universal Multiple-Octet Coded Character Set."

    (BTW, pet peeve: The word "acronym" should only be used to mean a
    pronounceable WORD ("nym") formed from the initials of other words.
    Classic examples are "scuba" and "radar." If you can figure out how to
    pronounce "qbcs," more power to you, but to me it's just an
    abbreviation.)

    > This must be an oxymoron, in the sense that character sets don't
    > really have a byte width, being completely abstract assignments of
    > abstract characters to abstract numbers.

    This is technically true, but the terms SBCS and DBCS are so entrenched
    in the industry that it doesn't seem useful to try to deprecate them
    now.

    > So what it really means must be "quadra-byte character encoding", and
    > both GB 18030 and UTF-32 should fit into that category.

    GB 18030, yes, because its code units vary from one to four bytes in
    length. UTF-32, no, because its code units are uniformly 32 bits.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Fri Aug 29 2003 - 00:14:35 EDT