From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Fri Nov 30 2007 - 10:47:19 CST
Arnt Richard Johansen wrote:
> In UAX #14, rule LB15 states "Do not break within '"[', even with
> intervening spaces." This is formalised as
>
> QU SP* × OP
>
> What is the rationale behind this rule?
Beats me. Whatever the rationale might be, the rule is harmful more
often than useful. I'm afraid the line breaking rules as a whole just
try too much: they define detailed rules for combinations, based on the
consideration of some _possible_ scenarious where the combinations might
appear.
> As an example, given a sufficiently small text area width, the
> algorithm will break text this way:
>
> "The
> Wire" (2005)
>
> but never this way:
>
> "The Wire"
> (2005)
>
> which is IMHO more logical.
Unfortunately, this isn't just theoretical. At least Internet Explorer
does this, both for ASCII quotation marks and for English quotation
marks. And IE does not adequately support Unicode control characters for
allowing line breaks, so the only cure is really the use of nonstandard
markup, the <wbr> tag (oddly named, since it does not indicate _word_
break but an allowed string break).
To take another example that has caused real trouble, consider LB19,
which prohibits line breaks before or after quotation marks. This is
nasty because it prevents a break before a "quoted" expression inside a
sentence, and at least some versions of Microsoft Word take this
seriously.
Line breaking rules are strongly language- and context-dependent, and
they shouldn't really be part of the Unicode Standard, except for some
very basic principles like the special controls for line break. The UAX
#14 rules are probably based on _some_ rational considerations but
oriented towards some largely unspecified situations. There is probably
a lot of language and context dependency hidden in them. And I don't the
rules have generally been implemented, but they have _partly_ been
implemented in various programs
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Fri Nov 30 2007 - 10:50:11 CST