RE: Korean line breaking rules : Unicode 3.0 (p. 124)

From: Hoon Kim (hpkim@basistech.com)
Date: Thu Mar 02 2000 - 11:06:00 EST


I mostly agree with Mr. Shin.

The roles of space character in Korean text are slightly different from that
of Western languages. Usage of space character was introduced relatively
recently (less than a couple of centuries ago) and one of the major points
of using them is to increase the readability. Since some Korean words (also
known as "Eojeol") are "visually" longer than the other, linebreaking
between Eojeols makes the printed text look zaggy and ugly. For that
reasons, book publishers traditionally alligned the text in justified mode,
allowing linebreaking between two Korean Eumjeol.

If you have Nadin Kano's Developing I18Nal Software for Windows 95 and
Windows NT, please take a look at p.244:

        Korean words expressed in hangul are separated by spaces,
        as they are in Western languages. Some Korean-language
        applications allow the user to choose whether or not to break
        lines between hangul characters.

The above paragraph may be just a little bit misleading and probably
caused some confusion among non-Korean I18N developers, working on Korean
software applications.

However, the book also adresses "Gumchik", stating that it is equivalent to
Japanese Kinsoku rule. (In fact, Gumchik and Kindsoku are in same
Hanja / Kanji -- meaning "prohibision rules") I really don't know who made
the "Gumchik" provision, but there are some applications (especially
Korean word processors, eg. HunMinJeongEum) which are using this rule,
and I think it reflects how you would break line between two Hangul
syllables.
(However, Gumchik seems rather intended for rendering text in printed
media.)

Linebreaking at spaces may be easy to implement at first for application
developers, but it should not be considered as "default" linebreaking mode
for Korean text, IMHO. The significance of proper linebreaking will
increase
as the world is going more digital.

Hoon Kim
Software Engineer
Basis Technology Inc.
www.basistech.com
+1-617-252-5636

-----Original Message-----
From: Jungshik Shin [mailto:jshin@pantheon.yale.edu]
Sent: Wednesday, March 01, 2000 10:23 PM
To: Unicode List
Subject: Korean line breaking rules : Unicode 3.0 (p. 124)

On Sun, 13 Feb 2000, Kenneth Whistler wrote:

> Lest anyone feel unduly constrained, let me note that now that
> the editorial committee has closed the book, so to speak, on Unicode 3.0,
> all of you who are about to open the book for the first time should
> feel free to unleash your commentary on the text.

   I've just received my copy of Unicode 3.0 book, here goes
my first commentary.

   On page 124(section 5.15 Locatiing Text element boundaries),
the third paragraph has the following around the end:

U3.0> In particular, word, line, and sentence boundaries will need to
U3.0> be customized according to locale and user preference. In Korean,
U3.0> for example, lines may be broken either at spaces(as in Latin text) or

U3.0> on ideographic boundaries (as in Chinese).

  First of all, it's a great mystery to me how on earth this
strange notion of Korean having *two* different line breaking rules(as
opposed to one) crept into the expertise of non-Korean experts on Korean
and finally made it into Unicode 3.0 book and Unicode TR on line breaking.

  None of tens of Korean books on my bookshelves
I've just gone through breaks lines *exclusively* at spaces. All of them
break lines freely at *syllables*. Only places where lines are broken
*exclusively* at spaces(for Korean text) I can think of are completely
*broken*(as far as Korean line breaking is concerned) web browsers like
Netscape and MS IE and possibly earlier implementations of Korean LaTeX.
One may add to the list Korean text formatted by non-localized version
of 'fmt' (in Unix) as another example. To work around the problem caused
by these broken web browsers, some Korean web authors apply a simple
filter to insert <wbr> between every pair of Korean syllables to their
html files. To see what I mean, you may wanna take a look at
<http://photon.hgs.yale.edu/~jungshik/lb.html> and
<http://photon.hgs.yale.edu/~jungshik/lbscreenshot.jpg>

  Let me emphasize that line can be broken at any syllable boundaries
in Korean text (except for some obvious exceptions as applied in English
text: i.e. punctuation marks like '!', '?' cannot begin a line).

  Secondly, even in Latin scripts(well, at least in English) lines can
be broken not only at spaces but also at syllables(syllabic boundaries)
with hyphen. Only difference between Korean line breaking and English
line breaking is Korean doesn't need hyphen when lines are broken at
syllables because in Korean syllables form another visual unit a level
higher than alphabetic/phonetic letters(consonants and vowels).

  Thirdly, the expression 'ideographic boundaries' is not appropriate
at all when describing Korean line breaking rules. More appropriate is
'syllabic boundaries' or 'syllables'.

  Given these, I'd like to suggest the last sentence(that begins with
'In Korean, for instance...') be removed in the future edition because
Korean is NOT a good example case where there can be multiple line
breaking rules depending on user preference.

    Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT