RE: Korean line breaking rules : Unicode 3.0 (p. 124)

From: Kenneth Whistler (
Date: Tue Mar 14 2000 - 20:32:52 EST

Erland Sommarskog suggested:

> Rick McGowan <> writes:
> > I think that unfortunately both Hoon Kim and Jungshik Shin I think have
> > *entirely* mis-interpreted the text. The text says:
> >

kenw inserts here the correct, exact citation of the text on p. 124:

"In Korean, for example, lines may be broken either at spaces (as in Latin
text) or on ideograph boundaries (as in Chinese)."

> >
> > The word "or" on the second line would never be interpreted as an "exclusive
> > or", it is an "inclusive or". In "C Language" syntax, it means "A|B"; it
> > does not mean "A^B".
> >
> > In that light, some of their previous comments should probably be re-examined.
> Anyway, the "or" you mention does not appear there by itself. It is
> coupled with the "either" on the line above. And "either or" often
> means an exlusive or. In less exact everyday talk, "either or" can
> often be inclusive, but in a technical text I would suggest that such
> usage should be avoided.
> I can't speak for our Korean friends, but possibly had the "either"
> never slipped in, their confusion would never have arised.

This is quite possibly the source of the misinterpretation, and should
be taken under advisement by the editors to clarify the next edition.

However, I would like to point out that the usual, colloquial extension
of this phraseology to indicate an *exclusive* or is:

   "either A or B but not both"

Furthermore, some familiarity with the breaking behavior for normal Latin
text and for Chinese text is assumed. With even the modicum of familiarity,
a strict exclusive reading is not sensible:

"In Korean, for example, lines may be broken either at spaces [only, but
not at ideograph boundaries] (as in Latin text) or on ideograph boundaries
[only, but not at spaces] (as in Chinese)."

The latter case of course makes no sense, since Chinese also allows line
breaks at spaces. The first case only makes limited sense, since Latin text
normally does not contain ideographs, so breaking on them is not defined.
And further, Latin text allows breaking on many other things besides

Perhaps someone would care to provide a one- to two-sentence short
summary of Korean line-breaking in such a way that it would serve as
an unambiguous exemplification of the point, which in any case was that
boundary determination "will need to be customized according to locale
and user preferences."


