Murray Sargent wrote:
> MBCS is a generic term that includes SBCS, DBCS, and character sets with
> more than two bytes. In an operational sense UTF-8 is a kind of MBCS, since
> in order to deal with it directly (rather than translating it to UTF-16 or
> UTF-32) you have to navigate over 1 to 4 bytes. (5 and 6 are ruled out by
> recent standards activities). A cool thing about UTF-8 is that you can
> easily find the start of a character if you land on a trail byte. But you
> still have to deal with other problems of MBCS, such as ensuring that the
> text cursor (or caret) always points to the start of a character, and saving
> for the next read any partial character sequence that ends an input buffer
> (if you need to translate to UTF-16 or UTF-32).
> UTF-16 surrogate pairs have similar considerations, but they are relatively
> easy to deal with, especially if your code can already handle multicharacter
> sequences such as CR LF and combining-mark sequences.
> Again, the thing I'd recommend is Unicode enabling rather than MBCS or DBCS
My definition of multi-byte coincides with Murray's. It's a generic reference
to charsets which have variable byte lengths for the characters. Double-byte
and single-byte are fixed. Using double-byte to refer to a charset which uses
both single-byte and double-byte lengths for its characters is misleading, IMHO.
-- Andrea Vine, firstname.lastname@example.org, iPlanet i18n architect ...even if it requires not really a dance with the Devil, but call it a brief shimmy with his accountant's daughter. -- Sean Burke http://www.netadventure.net/~sburke/
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT