Re: MES as an ISO standard?

From: Adrian Havill (
Date: Tue Jul 01 1997 - 22:23:19 EDT

> I definitely would like to see this as a new ISO standard with a new ISO
> standard number and a new title. Make is clearly recognizable a different
> character set for the naive moderately computer literate person out
> there who wonders what characters her email software supports.

The moderately computer literate person? What something supports?
Hmmm... let me give this a try without using any new ISO names and
numbers and standards to write a blurb for the outside of a e-mail
software package:

"Master-Mailer 7.05 is multilingual! MM 7.05 supports the sending,
receiving, composing, and displaying of the following languages:

Afrikaans, Breton, Basque, Catalan, Croatian, Czech, Danish, Dutch,
English, Esperanto, Estonian, Faroese, Finnish, Flemish, French,
Frisian, German, Greenlandic, Hawaiian, Hungarian, Icelandic,
Indonesian, Irish, Italian, Latin, Latvian, Lithuanian, Maltese,
Norwegian, Polish, Portuguese, Proven[c]al, Rhaeto-Romanic, Romanian,
Romany, Sami, Slovak, Slovenian, Sorbian, Spanish, Swahili, Swedish,

Master-Mailer uses Unicode internally, which will allow it to easily
handle future languages easily with plugin modules available at no cost
to licensed owners! Master-Mailer can send and receive in UTF-7, as well
as import and export transparently to legacy standards (including
conversion to and from the ISO-8859-1 through 10 character sets... see
the fine print on the bottom right for a complete list).

> Full ISO 10464 will not replace ASCII in the
> next 30 years in those 90% of applications that are not special
> i18n word processors. Period.

30 years ago?
> 90% developers without special i18n and global linguistic
> training who are today scared to death by bidi algorithms,
> combining characters, and representation forms. UTF-8 is already
> enough for them to worry about.

Those same developers are probably scared off by proportionally spaced
fonts, kerning, point sizes, and the like. You're complaining about
people being scared of technology that already exists and is in use in
almost every single GUI environment out there.

In other words, those that plan to use Unicode for CJK are not
intimidated by large character sets-- they've been dealing with large
sets for quite some time. Those that use languages that use BIDI are not
too intimidated by Unicode's BIDI... "seen that, done that."

It's been said many times before in reply to this thread, but few people
start off supporting FULL (every single character and every single
locale and display engine) Unicode. That's why there's so much
discussion here on how to deal with characters/glyphs/languages one
doesn't yet support (the unrenderable/last resort threads for instance).

For example, Java started out being Unicode and barely supported past
the ASCII range. But the fact that they STARTED with Unicode has made it
much more easy to internationalize Java.

Also, I think you meant the above as an exaggeration, but implementing
decoding/encoding routines and understanding UTF-8 is about the same
order of complexity as the bubble sort (the concepts behind -why- UTF-8
is the way it is (ASCII compatibility, random-string location access
efficiency, etc.), is a bit more involved, and is left as an exercise
for the 1st year comp-sci student. (^_^)

Personally, I think 90% of the developers that do plan on i18ning their
projects are not going to be so narrow minded and think of just those
with "simple character sets." (There are a lot of software hungry
computers out there in Japan and the rest of Asia that use those real
complicated character sets).

They may start with 16-bit Unicode, and implement only the languages
they know and understand and have the know-how and budget to implement,
adding a dummy routine (not complicated) to gracefully deal with
unsupported characters. This way, the path to upgrading is relatively

There is no need to specify additional character sets, etc., because
Unicode understood that few people would be able to implement the whole
standard (as Unicode is expanding, nobody can ever implement the "full
standard") off the bat and gives hints on how to deal with unsupported

The average app user doesn't need to memorize the name of another
character set (many people need more than one character set standard to
support their ONE language!) They need only know that the product
supports conversion to and from X character sets, and the fact that it
supports enough characters, etc., from Unicode (which implies
expandability) to support the Y language. People don't write in
character sets. They write in languages. Unicode supports almost every
modern language in existence. Yet an individual app can support Unicode
and not have to support every known modern language.

Adrian Havill <URL:>
Engineering Division, System Planning & Production Section

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT