"mbcs" is not one encoding, but hundreds. it means "everything byte-based,
possibly more than one byte per character".
in terms of windows, it is actually more limited than on other systems,
because windows is incapable of using codepages with more than 2 bytes per
character. other systems allow 3B/char or more, allow utf-8 as a system
codepage with its up to 6B/char, or allow shifting codes where the same
bytes mean different things based on the current state.
in any case, each of these codepages is designed for one language or a
small set of related languages. unicode is designed for all languages,
including some that there exists no _accepted_ codepage for.
mbcs is also hard to use: the same character typically has different bytes
in different codepages, so that you need to know every possible codepage
for your language if you want to exchange texts and do something useful
worse, the same byte may be a first byte in a multi-byte-character, or one
of the later ones. it can be difficult to figure out from a given random
place in a string where the character boundaries are. all the utf's are
unambiguous and efficient here.
all the codepages really have their special encoding forms and semantics.
unicode makes sure that the encoding forms are easily interchangable
without using a lookup table, and defines consistent semantics.
with windows mbcs, which its limit to a max of 2B/char, a codepage can have
at most ca. 30 000 characters, which is limiting for asian languages.
the history is that the old codepages come from a time when storage was
scarce and precious and international text exchange was not considered.
now we have dozens of different codepages for western european languages
alone. it is a nightmare to use text in old and new ones and exchange and
just consider the euro: the most popular western codepage, iso-latin-1, iso
8859-1, was full. no place for a new symbol. now everyone is supposed to
abandon this codepage in favor of iso 8859-15, which is almost the same,
except for some characters that are replaced by the euro and some french
and finnish characters - and it is full, too. unicode just added the euro
and will add thousands more characters when they are agreed upon.
win95/98 use mbcs internally and can only convert to and from ucs-2, in
addition to about a dozen of ucs-2 api functions.
win nt/2000 uses ucs-2 internally but also provides mbcs api functions,
which are slower than the ucs-2 ones.
win ce uses ucs-2 internally and can only convert to and from mbcs. really!
microsoft is going unicode and it is here in the consortium.
most unix and mainframe systems offer utf-8 locales, which is an equally
valid form of unicode.
many internet standards including html and xml use unicode, and the policy
is to do so for new ones, too.
java uses ucs-2.
the msdn is pretty bad about explaining anything beyond unicode 1.1.
Markus Scherer IBM RTP +1 919 486 1135 Dept. Fax +1 919 254 6430
Unicode is here! --> http://www.unicode.org/
"Patrick" <firstname.lastname@example.org> on 99-04-14 05:59:36
Subject: What is MBCS ?
I can figure out the whole picture about Unicode : I have been working on
it for quite some time now ;-)
ISO 10646 relationship with Unicode,
What is the BMP,
UTF-16 extension mechanism ....
And the list goes on and on ...
But, despite numerous help files and MSDN articles, I have been unable to
find the same information about MBCS, all the articles explain how to use
it, how to program using MBCS TCHAR etc .. but none of them give a proper
So, as of now I am still unable to answer to this simple question : Why
should we use Unicode which fully works only under Windows NT rather than
MBCS which works with all platforms ...
In other words : what are the "weaknesses" of MBCS compared to Unicode, how
was MBCS made up, what does it include ? is it going to be present in
Windows 2000 ?
Thanks in advance for your precious and prompt answer
Web : http://www.symbian.co.uk
E-mail : email@example.com
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT