Re: UAX #14: no line breaks between OP and QU, even if there are intervening spaces

From: Jukka K. Korpela (
Date: Fri Nov 30 2007 - 10:47:19 CST

  • Next message: Saqqara: "Unicode 5.1, Egyptian Transliteration, and Fonts"

    Arnt Richard Johansen wrote:

    > In UAX #14, rule LB15 states "Do not break within '"[', even with
    > intervening spaces." This is formalised as
    > QU SP* OP
    > What is the rationale behind this rule?

    Beats me. Whatever the rationale might be, the rule is harmful more
    often than useful. I'm afraid the line breaking rules as a whole just
    try too much: they define detailed rules for combinations, based on the
    consideration of some _possible_ scenarious where the combinations might

    > As an example, given a sufficiently small text area width, the
    > algorithm will break text this way:
    > "The
    > Wire" (2005)
    > but never this way:
    > "The Wire"
    > (2005)
    > which is IMHO more logical.

    Unfortunately, this isn't just theoretical. At least Internet Explorer
    does this, both for ASCII quotation marks and for English quotation
    marks. And IE does not adequately support Unicode control characters for
    allowing line breaks, so the only cure is really the use of nonstandard
    markup, the <wbr> tag (oddly named, since it does not indicate _word_
    break but an allowed string break).

    To take another example that has caused real trouble, consider LB19,
    which prohibits line breaks before or after quotation marks. This is
    nasty because it prevents a break before a "quoted" expression inside a
    sentence, and at least some versions of Microsoft Word take this

    Line breaking rules are strongly language- and context-dependent, and
    they shouldn't really be part of the Unicode Standard, except for some
    very basic principles like the special controls for line break. The UAX
    #14 rules are probably based on _some_ rational considerations but
    oriented towards some largely unspecified situations. There is probably
    a lot of language and context dependency hidden in them. And I don't the
    rules have generally been implemented, but they have _partly_ been
    implemented in various programs

    Jukka K. Korpela ("Yucca")

    Jukka K. Korpela ("Yucca")

    This archive was generated by hypermail 2.1.5 : Fri Nov 30 2007 - 10:50:11 CST