> Full Unicode with all bells and whistles is much to complicated
> to have any chance of becoming a full ASCII replacement in the
> near future. Therefore, people will stick with simple 8-bit
> character sets for many more years and we will not enter the
> 16-bit character set area very soon.
Two words: Think Java
> Just look at the many new
> ISO 8859 extention proposals to see that this is what is going
> What can we do against this problem? Not the idea of a 16-bit
> character set scares people, but the gigantic size of Unicode
> and all the complex mechanisms that are involved (bidi,
> combining chars, etc.)
I cannot quantify the "fear factor", but having been involved for
8 years now in software companies converting software to Unicode
support, I think I can state that hands-down the biggest barrier
is the jump from 8-bit to 16-bit characters. The rest is just
irrelevant noise for most implementations except the high-end
textual display and editing applications. Nearly everybody, including
the biggest of big boys--Microsoft--starts with the pared-down,
left-to-right no bidi, no combining marks subset of Unicode anyway.
Once you jump to 16-bit characters, you've taken the big first
bite. Everything else comes as incremental additions for most
> On the Web page
> there is a description the Minimum European Subset of ISO/IEC 10646-1
> (MES) defined in ENV 1973:1995.
> Are there any plans to make MES or something close to it a new
> ISO standard?
NO! As Jonathan pointed out, it is considered a named subset of
> I think, it would be an excellent idea to draft a new standard
> ISO 15646:1998 -- Multi-byte coded character set for
> European languages
> that specifies MES or a very similar Unicode subset with around
> 1000 characters. These characters should all have the
> following properties:
> - all are from left-to-right scripts
> - all are of approximately the size of latin characters
> (such that say 9x14 pixel character cell terminal emulators
> like xterm, VGA text-mode, and VT100 emulators can display
> them in one single cell)
This is special pleading for obsolete technology. Character cell
renderers should stick to the character sets they deal with.
Unicode is appropriate for the GUI world--and the least common
denominator there is being set by the browser/web-server vendors. I don't
hear the browser vendors lobbying for a restricted set. They are
market-driven and all need to (and already do, for the most part)
support the Asian market--which implies Japanese, Chinese,
Korean and more. And they are busy adding bidi support to do
correct handling of the Middle Eastern scripts.
> - all are non-combining characters and from scripts that
> can be represented using non-combining characters
> These are exactly the properties that characters in most ISO 8859
> parts also have, and therefore upgrading from ISO 8859-1 to
> ISO 15646 will cost only a fraction of what upgrading to full
> Unicode (with combing chars, bidi, large ideographic and dingbats
> glyphs, etc.) would cost.
This is an incorrect analysis, since the major cost is the 16-bit
hurdle. The cheap way in for Unicode support in software is
UTF-8--and that is why UTF-8 is catching on in a big way in
the Internet world.
> Let's make the idea of a 16-bit character set attractive to
> developers by defining an ISO character set that is apart from
> being a 16-bit character set in no way any more complicated than
> ISO 8859-1.
This wouldn't be attractive to any developers I know except ones
who believe their market in limited to the Latin, Greek, and
Cyrillic scripts and who underestimate the difficulty of making
the jump to 16-bit characters.
> ISO 15646 should be a superset of all ISO 8859 parts (except
> arabic and hebrew because of their bidi requirements), as well as
> of IBM code pages 437 (DOS) and 1252 (Windows). I think,
> MES is already exactly that.
> What do you think?
And regarding your response to Jonathan that this is a marketing
issue not addressed by defining a MES subset and profile for 10646,
any attempt to introduce another standard as an alternative to
10646 would just create marketing confusion. It would also be
firmly opposed by most (all?) of the vendors committed to
> Markus Kuhn, Computer Science grad student, Purdue
> University, Indiana, US, email: firstname.lastname@example.org
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT