Re: MES as an ISO standard?

From: Adrian Havill (havill@threeweb.ad.jp)
Date: Wed Jul 02 1997 - 02:20:03 EDT


I wrote:
> For example, Java started out being Unicode and barely supported past
> the ASCII range.

Markus Kuhn replied:
> Yes, leaving lots of confusion. Am I allowed to use combing characters
> in conforming Java variable names? Will they link to precomposed
> characters? The specification says "Don't know".

What? Did you just make that up? The "specification" does not say "don't
know"-- on the contrary, it is very clear about this. Please re-read
"The Java Language Specification," section 3.8 (Identifiers). To
summarize that section, a) yes, you can use decomposed characters and b)
A composite character identifier is different from a decomposed
character. "Two identifiers are the same only if they ... have the same
Unicode character [exact character sequence] for each letter or digit."
As for which Unicode characters are considered letters and which digits
in Java, this is spelled out in sections 20.5.17 and 20.5.18.

> I shipped my first product with full UTF-8 support back in early 1993,
> when UTF-8 was still called UTF-FSS, soon five years ago. I still feel
> VERY lonely and really miss the company of others offering software with
> UTF-8 support. Plan9 is so far the only system where UTF-8 really
> flies.

VRML 2.0, Netscape Communicator, HotJava, I think the next edition of IE
(not sure on this one) Wiley, the internals of Java .class files and
.jar files, and many other Java tools support UTF-8, to name a few
quickly off the top of my head.

> Ok, so lets just standardize this "only the languages they know" part
> of what you just mentioned and then you get the idea of what
> I suggested.

Fine, it's changed. Write a 1000 character set "standard." Call it
whatever you like. Just make sure that it's 16-bit, you leave the
characters outside of the set you support unchanged, (perhaps print
<U+xxxx> in your plain-CRT whenever you have to display one of these
characters) and that you don't redefine/add/change/move any Unicode
character.

Call it whatever you like. The name, whether it be 15646/whatever, means
nothing-- it's still Unicode by any other name. The name I care about is
already standardized: the language name. I won't ask "can it support
ISOwhatever." I'll ask, "Can it support the languages French, German,
Esperanto and Itallian using Unicode?"

> Admit it, you only don't like it because it excludes
> the Japanese character sets, right? Think of ISO 15646 more as a
> replacement for ISO 8859 and not as a replacement for Unicode.

No, I don't like it because you're reinventing the wheel-- adding yet
another standard when one isn't needed. Your concern seems to be that
Unicode is too complicated and we need this standard x to encourage
developers to minimally support set y as a intermediate stepping stone
to the future. This isn't needed.

I think of Unicode and the successor to ISO-8859-1 and Co. As for your
worry about Unicode not becoming accepted due to it being too
complicated, don't worry. It's already accepted by the industry (past
tense). Anything new (read: built with expansion-- more than English and
one other language-- in mind and not targeted exclusively at legacy
systems) being developed is being done in Unicode. It's a shame that
Unicode doesn't have it's own little icon on the outside of products
like "Intel Inside" or "100% Pure Java" or "Java Compatibile" (imagine
seeing software products that said "Unicode Compatible").

-- 
Adrian Havill <URL:http://www.threeweb.ad.jp/>
Engineering Division, System Planning & Production Section



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT