Re: MES as an ISO standard?

From: Markus G. Kuhn (
Date: Tue Jul 01 1997 - 23:31:12 EDT

Adrian Havill wrote on 1997-07-02 02:23 UTC:
> > 90% developers without special i18n and global linguistic
> > training who are today scared to death by bidi algorithms,
> > combining characters, and representation forms. UTF-8 is already
> > enough for them to worry about.
> Those same developers are probably scared off by proportionally spaced
> fonts, kerning, point sizes, and the like. You're complaining about
> people being scared of technology that already exists and is in use in
> almost every single GUI environment out there.

Yes, full ISO 10646 and Unicode are nice for GUI environments. But
there are non-GUI environments in wide use that could easily be
extended to around 1000 characters, that do not care today about
good typography (proportional fonts, kerning, ligatures, etc.) and
therefore will also not care about combining characters, bidi,
and representation forms.

I would like to see Unicode to be used in these systems, too, to support
more (as many as feasable) languages there, and that is why I propose
ISO 15646, a simple small 1000 character Unicode subset for these
non-GUI systems. Otherwise these systems will stick with ASCII
or ISO 8859-1 as they do today.

I am talking about library computers, the software that prepares your
tax returns and your bank account statements, about programming
environments and medical data processing systems. None of these
applications care in any way about high quality typography and most
of them use user interface layouts depending on monospaced fonts, but
they could fairly simply be extended to around 1000 characters to
cover at least all latin/cyrillic/greek and some more scripts.

> It's been said many times before in reply to this thread, but few people
> start off supporting FULL (every single character and every single
> locale and display engine) Unicode. That's why there's so much
> discussion here on how to deal with characters/glyphs/languages one
> doesn't yet support (the unrenderable/last resort threads for instance).

Ok, then let's write a well-defined ISO standard of what this "not FULL
Unicode" means exactly, in order to give implementors a well-defined
realistic intermediate goal to make the way to Unicode more attractive,
especially for those working in non-GUI environments. With MES, most
of the work has essentially already been done anyway.

> For example, Java started out being Unicode and barely supported past
> the ASCII range.

Yes, leaving lots of confusion. Am I allowed to use combing characters
in conforming Java variable names? Will they link to precomposed
characters? The specification says "Don't know". With ISO 8859-1
or our hypothetical ISO 15646 questions like these would not even

> But the fact that they STARTED with Unicode has made it
> much more easy to internationalize Java.

Fully agreed. In the same way, starting with ISO 15646 would mean
starting with Unicode, as its code table is a strict subset and as it
has the same 16-bit encoding.

> Also, I think you meant the above as an exaggeration, but implementing
> decoding/encoding routines and understanding UTF-8 is about the same
> order of complexity as the bubble sort (the concepts behind -why- UTF-8
> is the way it is (ASCII compatibility, random-string location access
> efficiency, etc.), is a bit more involved, and is left as an exercise
> for the 1st year comp-sci student. (^_^)

I shipped my first product with full UTF-8 support back in early 1993,
when UTF-8 was still called UTF-FSS, soon five years ago. I still feel
VERY lonely and really miss the company of others offering software with
UTF-8 support. Plan9 is so far the only system where UTF-8 really
flies. I am pretty sure that of my graduation year, I am the only
CS graduate student who has ever even heard about UTF-8 (excluding the
five to which I explained it myself). Even trivial things like UTF-8
are still widely unknown technology except for the few i18n experts
here on the list. I bet less than 3% of all computer science professors
even know what UTF-8 is !!!

> Personally, I think 90% of the developers that do plan on i18ning their
> projects are not going to be so narrow minded and think of just those
> with "simple character sets."

ISO 15646 is intended for people who DO NOT EVEN THINK ABOUT a i18n
strategy, but who are just looking for the best character set they can
handle. ISO 15646 supports 100 languages, ISO 8859-1 only 10, implementation
effort is the same, so why not use ISO 15646, although we only develop
for the U.S. market. This we, we cover at least all of Europe. And
if a Japanese customer is later interested, then upgrading from
ISO 15646 to Unicode is easier than if we had started with 8-bit,
because we have at least already the 16-bit infrastructure.

What I am worried about is that there is still such a lot of interest in
8-bit character sets. Look at the new French ISO 8859 Latin-0 proposal for

It seems, 8-bit charsets are not yet dead. I want to kill
them, but >40 000 character Unicode is certainly no danger for those
nice simple 8-bit charsets.

> They may start with 16-bit Unicode, and implement only the languages
> they know and understand and have the know-how and budget to implement,
> adding a dummy routine (not complicated) to gracefully deal with
> unsupported characters. This way, the path to upgrading is relatively
> painless.

Ok, so lets just standardize this "only the languages they know" part
of what you just mentioned and then you get the idea of what
I suggested.

Admit it, you only don't like it because it excludes
the Japanese character sets, right? Think of ISO 15646 more as a
replacement for ISO 8859 and not as a replacement for Unicode.

May be, ISO 18859 would be a better number for MES ... ;-)


Markus G. Kuhn, Computer Science grad student, Purdue
University, Indiana, USA -- email:

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT