Re: Comments on <draft-ietf-acap-mlsf-00.txt>

From: Martin J. Duerst (mduerst@ifi.unizh.ch)
Date: Fri Jun 06 1997 - 06:05:11 EDT

Next message: BERLAND Remi: "(no subject)"
Previous message: Martin J. Duerst: "Re: Yet another Unihan Q"
Maybe in reply to: Martin J. Duerst: "Re: Comments on <draft-ietf-acap-mlsf-00.txt>"
Next in thread: Martin J. Duerst: "Re: Comments on <draft-ietf-acap-mlsf-00.txt>"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Oh, yes, Sakamura.

Examples like those below give Sakamura and the other anti-
Unicoders very bad credit. Whether they are done by ignorance
or on purpose doesn't change much.

Things like these make it very difficult for people that know
the details to agree to claims such as "we need language codes
in-band" (exactly what Sakamura has) "to avoid CJK problems"
(which are between 10 and 100 times smaller than perceived,
and which can in any way only be solved to about 30-50%).

Even if people such as Mark Crispin and Chris Neuman have
a considerable experience in the area of multilingual text
processing, with regarding to Han and in general, there are
many people on the Unicore list that have considerably
more knowledge and experience, and it's quite reasonable
that they ask "what do you want this language taging to be
used for?".

On the other hand, I can understand Mark and Chris who have
fought many fights for Unicode. If you need help somewhere,
please tell me (I tried to subscribe to the ACAP list, but
with no success yet).

On Thu, 5 Jun 1997, Adrian Havill wrote:

> If you want a complicated character system that does tags and
> everything, there are plenty to choose from-- Unicode basher Prof. Ken
> Sakamura (U. of Tokyo) and Co. would be more than happy to tout the
> virtues of TRON, which is loaded with escape sequences galore. The TRON
> project has made a religion out of bad-mouthing Unicode, much like the
> computer industry has made a religion out of bad-mouthing a certain
> software firm in Redmond, Washington (who make a darn fine Unicode based
> OS, I might add). They have to-- they have to justify that the years of
> blood, sweat, tears (and most importantly, money) they've used making
> -their- worldwide standard character set has not repeated work that's
> already here and in use and better.
>
> (see
> <URL:http://www.personal-media.co.jp/vs/mltp96/keynote/keynote_e.html>

Oh yes, Sakamura. He is a very bright and hard working person, and
it's always sad to see him mix up facts and fiction and create rumors
and lies as in this paper. I have seen this published in many places,
but it doesn't make the contents any more true. As an exaple, please
have a look at the end of section 3.1.2. Two examples of how rumors
are spread that are absolutely ridiculous but that are very difficult
to correct.

The first is the capital A/Alpha story, where Sakamura complains
that it appears three times, in Latin, Greek, and Cyrillic, which
is unfair, because Han was forced to unify, but the Europeans were
not. Well, there are all kinds of arguments about this, but there
is one that cut clear. Unicode, for good reasons, has the source
separation rule. The first major standard that has all these three
separate is JIS 208. So Japanese shouldn't complain :-).

Just after that example, there is a paragraph about some
shapes/codepoints/glyphs/characters that when your read it,
you think Unicode really got that wrong. But have a look at
the shapes in Figure 1. The difference between a) and b) is
in the upper right. The difference between b) and c) is in
the lower middle. The difference between a) and b) is a full
stroke, the difference between b) and c) just a small hook.
So it looks totally wrong that a) and b) have the same
Unicode codepoint (U+8FC0), but c) is at U+8FC2. And Sakamura
is trying to make a point on this, and is succeding on the
unattentive reader. However, readers that know a little bit
of Kanji history know that the difference between a) and
b) is a regular style difference which appears in many
characters and never distinguishes two characters.
Because the shapes in question are not very well known, it
is more difficult to assert that b) and c) are indeed
different characters, with different pronounciations,
and historically not related. But a check in a dictionary
will quickly confirm this. Also, a reader can guess this
from the fact that the difference between them appears
as a significant difference in some other characters
(it's not always a significant difference; for example
the "wood" radical and others can be written both ways,
even in Japanese schools!).

Regards, Martin.

Next message: BERLAND Remi: "(no subject)"
Previous message: Martin J. Duerst: "Re: Yet another Unihan Q"
Maybe in reply to: Martin J. Duerst: "Re: Comments on <draft-ietf-acap-mlsf-00.txt>"
Next in thread: Martin J. Duerst: "Re: Comments on <draft-ietf-acap-mlsf-00.txt>"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT