From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Nov 30 2007 - 14:50:01 CST
Jukka Korpela responded:
> Arnt Richard Johansen wrote:
>
> > In UAX #14, rule LB15 states "Do not break within '"[', even with
> > intervening spaces." This is formalised as
> >
> > QU SP* × OP
> >
> > What is the rationale behind this rule?
>
> Beats me. Whatever the rationale might be, the rule is harmful more
> often than useful.
Please see the clarification under the "QU" section in
the proposed update to UAX #41:
http://www.unicode.org/reports/tr14/tr14-21.html
> Line breaking rules are strongly language- and context-dependent,
True.
> and
> they shouldn't really be part of the Unicode Standard, except for some
> very basic principles like the special controls for line break.
Perhaps Asmus will wade in here with a fuller justification,
but the consensus in the UTC has been that it is better to
write out an explicit *default* line breaking specification
that implementers can (and should) then tailor for specific
situations and languages, rather than simply letting a thousand
flowers bloom with no recommendations whatsoever -- which could
only lead to more unexplained interoperability problems.
> The UAX
> #14 rules are probably based on _some_ rational considerations but
> oriented towards some largely unspecified situations. There is probably
> a lot of language and context dependency hidden in them.
Perhaps. But I think people may not be reading the UAX scoping
carefully enough:
"... This annex provides more detailed information about
default line breaking behavior reflecting best practices for
^^^^^^^
the support of multilingual texts."
^^^^^^^^^^^^
That doesn't mean that it is normative, required behavior.
Nor does it mean that the default UAX #14 algorithm purports
to be best practice for any given language, including English
text.
"For most Unicode characters, considerable variation in line
breaking behavior can be expected, including variation based
on local or stylistic preferences."
And of course, the handling of line breaking in the vicinity
of quotation marks is a prime example of that, because
quotation conventions are so variable.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Nov 30 2007 - 14:52:35 CST