Re: FW: Unicode Hangul and Internet

From: John Cowan (cowan@locke.ccil.org)
Date: Tue Apr 20 1999 - 17:46:13 EDT


> > As you know, several character sets are actually used for
> > representing Far-Eastern languages. While some of these character
> > sets do not even list the Korean alphabet, Unicode seems to go in the
> > opposite direction, because it reserves for the Korean syllables more
> > than 11,000 positions, not considering that it is possible to obtain
> > the Korean "graphical syllable" by means of software: see, for
> > instance, the Microsoft Global IME (Input Method Editor) 5.0.

Those 11K codepoints (the Johab code set) were put in at the request
of the Korean national standards body. But nobody has to use them;
Unicode has a very complete Hangul jamo set and rules for mapping
jamo sequences to Johab codes algorithmically. In addition, some
pre-modern hangul cannot be represented as single codes; the
conjoining jamo must be used.

> > The problem of the Korean syllable is similar to the problem of the
> > right-to-left direction used for Arabic or for Hebrew, or to the up-
> > to-bottom direction of the writing system of Inner Mongolia. For
> > representing correctly the Hangul in the Web pages the solution
> > probably has to pass through XML, but it is useless to have so many
> > positions occupied in Unicode when the problem could be easily solved
> > by software (and, moreover, every Korean could understand the Hangul
> > even if not graphically grouped in syllables). Those 11,000 positions
> > could be precious in order to reach a unified 16-bit character set
> > good for all the languages of the world.

Not when there is talk of 90K or 100K of hanzi floating around.

> > It is necessary to
> > obtain a revision of all the Far-Eastern character sets now used for
> > the exchange of data (especially for Internet) with the intent of
> > reaching as soon as possible a unified 16-bit character set good for
> > all the world, Far-East included.

That's what Unicode *is*, warts and all. Yes, it's a wart that we
need to go past 65536 codepoints, but it's a small wart, all things
considered.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
	You tollerday donsk?  N.  You tolkatiff scowegian?  Nn.
	You spigotty anglease?  Nnn.  You phonio saxo?  Nnnn.
		Clear all so!  'Tis a Jute.... (Finnegans Wake 16.5)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT