Re: UAX #14: no line breaks between OP and QU, even if there are intervening spaces

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Nov 30 2007 - 14:50:01 CST

  • Next message: Peter Constable: "Vai word boundaries and line breaking"

    Jukka Korpela responded:

    > Arnt Richard Johansen wrote:
    >
    > > In UAX #14, rule LB15 states "Do not break within '"[', even with
    > > intervening spaces." This is formalised as
    > >
    > > QU SP* OP
    > >
    > > What is the rationale behind this rule?
    >
    > Beats me. Whatever the rationale might be, the rule is harmful more
    > often than useful.

    Please see the clarification under the "QU" section in
    the proposed update to UAX #41:

    http://www.unicode.org/reports/tr14/tr14-21.html

    > Line breaking rules are strongly language- and context-dependent,

    True.

    > and
    > they shouldn't really be part of the Unicode Standard, except for some
    > very basic principles like the special controls for line break.

    Perhaps Asmus will wade in here with a fuller justification,
    but the consensus in the UTC has been that it is better to
    write out an explicit *default* line breaking specification
    that implementers can (and should) then tailor for specific
    situations and languages, rather than simply letting a thousand
    flowers bloom with no recommendations whatsoever -- which could
    only lead to more unexplained interoperability problems.

    > The UAX
    > #14 rules are probably based on _some_ rational considerations but
    > oriented towards some largely unspecified situations. There is probably
    > a lot of language and context dependency hidden in them.

    Perhaps. But I think people may not be reading the UAX scoping
    carefully enough:

    "... This annex provides more detailed information about
    default line breaking behavior reflecting best practices for
    ^^^^^^^
    the support of multilingual texts."
                   ^^^^^^^^^^^^
                   
    That doesn't mean that it is normative, required behavior.
    Nor does it mean that the default UAX #14 algorithm purports
    to be best practice for any given language, including English
    text.

    "For most Unicode characters, considerable variation in line
    breaking behavior can be expected, including variation based
    on local or stylistic preferences."

    And of course, the handling of line breaking in the vicinity
    of quotation marks is a prime example of that, because
    quotation conventions are so variable.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Nov 30 2007 - 14:52:35 CST