RE: FW: Unicode Hangul and Internet

From: Chris Pratley (chrispr@microsoft.com)
Date: Wed Apr 21 1999 - 00:24:26 EDT


In addition to John's comments, I would add that I strongly disagree with
this statement from the original post:
> I think that the space occupied by the useless Korean syllable groups
> in the Unicode set prevented Unicode from becoming a really universal
> character set, used all over the world.

In reality, Unicode *is* being used all over the world. Without trying to
sound too grandiose, it is important to realize that Unicode is incredibly
widely used today - the people using it just don't realize it. Around the
world and in Asia, in Japan and all parts of China, it is a safe bet that
>50% of text being written today on computers is stored as Unicode
(Microsoft Word97, Word98, and JustSystems' Ichitaro). In Asia, over 50% of
Internet content, in particular around 65% of Korean content is now viewed
in Unicode (Internet Explorer), even if the web content itself is not stored
in Unicode. This trend is continuing: Word97 Korean is already Unicode,
Hangul and Computer plans to move their AreA Hangul word processor to
Unicode, and Navigator 5 will be based on Unicode. Besides word processors
and browsers, there are other programs used everyday in Korea like Excel97
and PowerPoint97 that use precomposed Unicode Hangul syllables. And
Office2000 will add Access2000, based on Unicode, using precomposed Hangul.
So, the myth of Unicode "non-adoption" is just a myth.

It is also twisting history to say that using a code point for each combined
Jamos delayed adoption of Unicode. In fact, it very much *hastened* it.
Composed Hangul syllables are the easiest way to get high quality Hangul
glyphs. Composing Jamos in software is not terribly difficult, but getting a
high quality result that matches the quality of pre-composed text is much
harder for the developer than simply using precomposed characters. Further,
integration with existing Korean national standards was much easier with
precomposed characters since those existing standards used precomposed
characters too. This is not to say that combining Jamos are not desirable,
especially to handle old Hangul syllables. But it is a fact that software
that handles combining Jamos is rare today, and software that handles
precomposed Unicode Hangul is the norm.

There are much stronger barriers to Unicode adoption than Korean
pre-composed syllables, namely operating system support that is lacking in
many OSes. But all popular OSes today can handle Unicode to some degree,
people who want such functionality can write Unicode software, and they are
doing it.

Chris Pratley
Microsoft Office Program Manager

-----Original Message-----
From: John Cowan [mailto:cowan@locke.ccil.org]
Sent: April 20, 1999 7:24 PM
To: Unicode List
Subject: Re: FW: Unicode Hangul and Internet

> > As you know, several character sets are actually used for
> > representing Far-Eastern languages. While some of these character
> > sets do not even list the Korean alphabet, Unicode seems to go in the
> > opposite direction, because it reserves for the Korean syllables more
> > than 11,000 positions, not considering that it is possible to obtain
> > the Korean "graphical syllable" by means of software: see, for
> > instance, the Microsoft Global IME (Input Method Editor) 5.0.

Those 11K codepoints (the Johab code set) were put in at the request
of the Korean national standards body. But nobody has to use them;
Unicode has a very complete Hangul jamo set and rules for mapping
jamo sequences to Johab codes algorithmically. In addition, some
pre-modern hangul cannot be represented as single codes; the
conjoining jamo must be used.

> > The problem of the Korean syllable is similar to the problem of the
> > right-to-left direction used for Arabic or for Hebrew, or to the up-
> > to-bottom direction of the writing system of Inner Mongolia. For
> > representing correctly the Hangul in the Web pages the solution
> > probably has to pass through XML, but it is useless to have so many
> > positions occupied in Unicode when the problem could be easily solved
> > by software (and, moreover, every Korean could understand the Hangul
> > even if not graphically grouped in syllables). Those 11,000 positions
> > could be precious in order to reach a unified 16-bit character set
> > good for all the languages of the world.

Not when there is talk of 90K or 100K of hanzi floating around.

> > It is necessary to
> > obtain a revision of all the Far-Eastern character sets now used for
> > the exchange of data (especially for Internet) with the intent of
> > reaching as soon as possible a unified 16-bit character set good for
> > all the world, Far-East included.

That's what Unicode *is*, warts and all. Yes, it's a wart that we
need to go past 65536 codepoints, but it's a small wart, all things
considered.

--
John Cowan      http://www.ccil.org/~cowan              cowan@ccil.org
        You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
        You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
                Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT