Re: UAX #14: no line breaks between OP and QU, even if there are intervening spaces

From: Jukka K. Korpela (
Date: Sun Dec 02 2007 - 11:47:56 CST

  • Next message: Benjamin M Scarborough: "Re: Unicode 5.1, Egyptian Transliteration, and Fonts"

    Asmus Freytag wrote:

    > You seem to want a number of contradictory things.

    Don't we all? But I wouldn't refer to contradiction here.

    > Rule LB15 got its origin from just such an attempt to be conservative
    > when in doubt, realizing that allowing a bad break can be more
    > damaging than missing a break opportunity.

    Yes, but it forbids a break at a space. I don't think I'm
    self-contradictory in saying "be conservative in allowing breaks, but
    allow breaking at a space". A space is an exception rather than
    contradiction. It is based on a long writing tradition in some cultures.

    > The algorithm is intended for multilingual text or for multilingual
    > environments. It can therefore _not_ simply assume that spaces are
    > what makes the break. Doing so, would cause very suboptimal
    > typography for Asian contexts.

    I'm not saying that "spaces are what makes the break". I suggested a
    simple approach that combines script-specific rules, breaking at spaces,
    and explicit line break controls. I don't think it's possible to get
    much farther at the "multilingual" level, which really mean
    language-ignorant level. (This is basically about texts in unknown
    languages, "unknown" in the sense that the processing software does not
    apply language-sensitive rules.) If you try harder, confusion and
    problems arise.
    > The original algorithm, before rule 15, was tested in shipping
    > implementations before offering it as a seed for the standardization
    > effort. It was itself based on European de-facto practice and certain
    > Asian standards in the area of linebreaking.

    As fas as I have understood, the Unicode line breaking rules have varied
    a lot (and programs may still reflect older versions - I can almost
    daily see typeset text that has incorrectly broken abc:abc after the
    colon), and I have never seen any software that comes even close in
    applying the Unicode rules. But I have seen software that applies _some_
    of the rules. For obvious reasons, I mostly observe such things when
    they produce outright wrong and mad results.

    > Because a bad linebreak following an opening punctuation (or right
    > before a closing punctuation) is a very serious issue in non-Western
    > line layout, the UTC adopted the cautious formulation of Rule 15.

    I still haven't seen a case where " (..." appears, for any quotation
    marks. I don't deny the possibility of such expressions. I'm just saying
    that they must be extremely rare, if not contrived, and that _they_
    (rather than some much more common situations) should be handled by
    language-specific exceptions to line breaking, or by a no-break space,
    or by some other tools.

    Jukka K. Korpela ("Yucca")

    This archive was generated by hypermail 2.1.5 : Sun Dec 02 2007 - 11:50:03 CST