Re: Korean linebreking and UTR14(was Re: extracting words)

From: Mark Davis (
Date: Mon Feb 12 2001 - 22:53:52 EST

Asmus Freytag is the one to talk to; he can look into this.


----- Original Message -----
From: "Jungshik Shin" <>
To: "Unicode List" <>
Sent: Monday, February 12, 2001 13:33
Subject: Korean linebreking and UTR14(was Re: extracting words)

> On Sun, 11 Feb 2001, Mark Davis wrote:
> MD> Please read TUS Chapter 5 and the Linebreak TR before proceeding, as I
> MD> recommended in my last message. The Unicode standard is online, as is
> MD> TR. Both can be found by going to, and selecting the
> MD> topic. The TR in particular discusses the recommended approach to line
> MD> in great detail.
> As I wrote when TUS 3.0 came out, I cannot help wondering where the idea
> that leads to the following in the TR on line breaking (and what's written
> about it in Chap 5o of TUS 3.0) came from.
> UTR14> Korean may alternately use a space-based (style 1) instead of the
> UTR14> style 2 context analysis.
> UTR14> 1. Korean uses either implicit breaking around
> UTR14> Hangul and ideographs or uses spaces. Reference [1] shows
> UTR14> how this can be elegantly handled by the second or third
> UTR14> method. Only the intersection of ID/ID, AL/ID and ID/AL
> UTR14> are affected. For alphabetic style line breaking, breaks
> UTR14> for these four cases require space, for ideographic style
> UTR14> line breaking, these four cases don't require spaces.
> where style 1 and style2 are defined as
> UTR14> 1. Western (spaces and hyphens are used to determine breaks)
> UTR14> 2. East Asian (lines can break anywhere, unless prohibited)
> Let me make it clear that virtually NO books published in Korean uses
> space-based (style 1) line breaking rule. Style 2 line breaking rule
> is *exclusively* used for modern Korean text no matter what some broken
> word processors for Korean offer as an alternative to style 2 and what
> some web browsers (e.g. Netscape 4.x. Mozilla fixed this problem) do.
> I'm very alarmed to find this 'misinformation' crept into the UTS and
> UTR14 (now UAX #14). It would be nice if somebody in charge could get
> this straightened.
> Regards,
> Jungshik Shin

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT