For what it's worth, my personal convention is a bit different from the one
described by Murray, unless I'm discussing C APIs. Ordinarily, I use the
term "multibyte encoding" to make a distinction that means "non-single-byte
encoding". "Multi", to most people including me, simply means more than
single. Fixed-width double-byte encodings, like UCS-2, are multibyte
encodings, as are fixed-width 4-byte encodings like UCS-4 and UTF-32. So are
all variable width encodings such as UTF-8, UTF-16, Shift-JIS, etc. Any
encoding that is not limited to a single byte per character is a multibyte
I've found this usage to be useful in conversation with non-specialists,
because of the rather clean distinction between older systems that deal
exclusively in bytes and newer systems that actually deal in characters.
This also saves me the need to get into the distinction between encodings
and character sets when discussing designs with non-programmers. I can just
say that all systems need to support multibyte encodings and that
multibyte-encoded test data will be used everywhere to verify that they do.
Or, for example, if someone asks the very common question, "Does Korean use
regular[sic] or double-byte characters?", I can answer with, "the Korean
encoding that we'll use is a 'multibyte encoding', so we'll need to be
careful not to create any single-byte limitations in the code...."
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT