Re: Korean linebreking and UTR14(was Re: extracting words)

From: Mark Davis (markdavis34@home.com)
Date: Tue Feb 13 2001 - 09:53:31 EST


If I want to get anyone's attention, I would send them a direct message.
Many people on the list, myself included, get swamped at times and don't
necessarily look at every message.

Mark

----- Original Message -----
From: "Jungshik Shin" <jshin@mailaps.org>
To: "Unicode List" <unicode@unicode.org>
Sent: Monday, February 12, 2001 20:30
Subject: Re: Korean linebreking and UTR14(was Re: extracting words)

>
>
> On Mon, 12 Feb 2001, Mark Davis wrote:
>
> Thank you for your answer.
>
> > Asmus Freytag is the one to talk to; he can look into this.
>
> Do you think I should contact him directly off-line? I thought he's on
> this list now as well as back in March 2000 when I wrote about TUS 3.0
> p. 124.
>
> > On Mon, 12 Feb 2001, "Jungshik Shin" <jshin@mailaps.org> wrote:
> > > On Sun, 11 Feb 2001, Mark Davis wrote:
> > >
> > > MD> Please read TUS Chapter 5 and the Linebreak TR before proceeding,
as I
> > > MD> recommended in my last message. The Unicode standard is online, as
is
>
> > > As I wrote when TUS 3.0 came out, I cannot help wondering where the
idea
> > > that leads to the following in the TR on line breaking (and what's
written
> > > about it in Chap 5o of TUS 3.0) came from.
> > >
> > > UTR14> Korean may alternately use a space-based (style 1) instead of
the
> > > UTR14> style 2 context analysis.
>
> BTW, this clearly shows that what Rick McGowan wrote about 'either ... or'
> in response to what I wrote about Korean line breaking rule (TUS 3.0
> p. 124) in March 2000 is not right like I argued then. I'm sure he's
> right about 'either ... or ' in English grammar but the intention of the
> author is on my side if the author of UTR 14 is the same as that of the
> part in question in TUS 3.0. I'm enclosing at the end of this message
> a part of my message in response to him.
>
>
> > > I'm very alarmed to find this 'misinformation' crept into the UTS and
> > > UTR14 (now UAX #14). It would be nice if somebody in charge could get
> > > this straightened.
>
> This didn't make it in Unicode 3.1, either. What would be the best way
> to get it addressed before next revision comes out? I'm afraid just
> raising it on this list wouldn't be sufficient (of course, I should
> have followed up more vigorously last year)
>
> Regards,
>
> Jungshik Shin
>
>
> Enc.
>
> 1. Two messages of mine
> the first one : March 1, 2000
> the second one: March 2, 2000
>
> From: Jungshik Shin <jungshik.shin@yale.edu>
> Subject: Korean line breaking rules : Unicode 3.0 (p. 124)
> Date: Wed, 1 Mar 2000 19:23:23 -0800 (PST)
>
> On Sun, 13 Feb 2000, Kenneth Whistler wrote:
>
> > Lest anyone feel unduly constrained, let me note that now that
> > the editorial committee has closed the book, so to speak, on Unicode
3.0,
> > all of you who are about to open the book for the first time should
> > feel free to unleash your commentary on the text.
>
> I've just received my copy of Unicode 3.0 book, here goes
> my first commentary.
>
> On page 124(section 5.15 Locatiing Text element boundaries),
> the third paragraph has the following around the end:
>
> U3.0> In particular, word, line, and sentence boundaries will need to
> U3.0> be customized according to locale and user preference. In Korean,
> U3.0> for example, lines may be broken either at spaces(as in Latin text)
or
> U3.0> on ideographic boundaries (as in Chinese).
>
> First of all, it's a great mystery to me how on earth this
> strange notion of Korean having *two* different line breaking rules(as
> opposed to one) crept into the expertise of non-Korean experts on Korean
> and finally made it into Unicode 3.0 book and Unicode TR on line breaking.
>
> None of tens of Korean books on my bookshelves
> I've just gone through breaks lines *exclusively* at spaces. All of them
> break lines freely at *syllables*. Only places where lines are broken
> *exclusively* at spaces(for Korean text) I can think of are completely
> *broken*(as far as Korean line breaking is concerned) web browsers like
> Netscape and MS IE and possibly earlier implementations of Korean LaTeX.
> One may add to the list Korean text formatted by non-localized version
> of 'fmt' (in Unix) as another example. To work around the problem caused
> by these broken web browsers, some Korean web authors apply a simple
> filter to insert <wbr> between every pair of Korean syllables to their
> html files. To see what I mean, you may wanna take a look at
> <http://photon.hgs.yale.edu/~jungshik/lb.html> and
> <http://photon.hgs.yale.edu/~jungshik/lbscreenshot.jpg>
>
> Let me emphasize that line can be broken at any syllable boundaries
> in Korean text (except for some obvious exceptions as applied in English
> text: i.e. punctuation marks like '!', '?' cannot begin a line).
>
> Secondly, even in Latin scripts(well, at least in English) lines can
> be broken not only at spaces but also at syllables(syllabic boundaries)
> with hyphen. Only difference between Korean line breaking and English
> line breaking is Korean doesn't need hyphen when lines are broken at
> syllables because in Korean syllables form another visual unit a level
> higher than alphabetic/phonetic letters(consonants and vowels).
>
> Thirdly, the expression 'ideographic boundaries' is not appropriate
> 'syllabic boundaries' or 'syllables'.
>
> Given these, I'd like to suggest the last sentence(that begins with
> 'In Korean, for instance...') be removed in the future edition because
> Korean is NOT a good example case where there can be multiple line
> breaking rules depending on user preference.
>
> Jungshik Shin
>
> From: Jungshik Shin <jungshik.shin@yale.edu>
> Subject: RE: Korean line breaking rules : Unicode 3.0 (p. 124)
> Date: Thu, 2 Mar 2000 12:20:31 -0800 (PST)
>
> On Thu, 2 Mar 2000, Rick McGowan wrote:
>
> > I think that unfortunately both Hoon Kim and Jungshik Shin I think have
> > *entirely* mis-interpreted the text. The text says:
>
>
> > U3.0> for example, lines may be broken either at spaces(as in Latin
> > U3.0> text) or U3.0 on ideographic boundaries (as in Chinese).
>
> > The word "or" on the second line would never be interpreted as an
"exclusive
> > or", it is an "inclusive or". In "C Language" syntax, it means "A|B";
it
> > does not mean "A^B".
> U3.0> In particular, word, line, and sentence boundaries will need to
> U3.0> be customized according to locale and user preference. In Korean,
>
> If it's written with that intention, what would you say about
> the preceeding two lines? What's 'user preference' here? It implies
> 'exclusive or', doesn't it? In other words, it implies users may choose
> to turn off 'B', doesn't it? (No Korean typesetter in her/his right mind
> would do that.) If not, what's the point of taking an example of Korean
> line breaking after that sentence about 'user preference'?
>
> On top of that, if that's your intention, it'd be clearer
> to say 'lines can be broken on both spaces and syllable boundaries'(or
> on any syllable boundaries including spaces), woudln't it?
>
> > In that light, some of their previous comments should probably be
> re-examined.
>
> Nonetheless, the last sentence of the paragraph in
> question about Korean line breaking had better be removed(it's not
> necessary at all in my opinion) to avoid possible/unnecessary confusion
> it leads to (as is evident in Netscape's implementation of Korean line
> breaking).
>
> Jungshik Shin
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT