Re: UAX#14-20: undesriable line breaking opportunities (parenthese and quotation marks)

From: Asmus Freytag (
Date: Wed Jul 25 2007 - 15:58:41 CDT

  • Next message: Philippe Verdy: "RE: UAX#14-20: undesriable line breaking opportunities (parenthese and quotation marks)"

    With all changes to UAX#14, it is important to make sure that existing
    implementations can continue to be conformant as much as possible, that
    constraints on character behavior are retained for formatting characters
    (so that people cannot redesign a ZWSP or WJ by treating it in novel and
    incompatibel ways and claiming conformance), while making sure that
    there are no unreasonable (and artificial) restraints on higher level
    protocols. I believe in large measure, Andy has been able to do that in
    the latest update.

    As for the putative issue you raise here, Ken has already pointed out
    why there may not in fact be a problem where you see one. Your furhter
    statements show that you haven't fully understood some of the core
    concepts of the way the default line breaking algorithm is intended to work.

    On 7/25/2007 5:17 AM, Philippe Verdy wrote:
    > As an alternative to my proposal, parentheses or quotation marks could also
    > be described by making them inherit the line breaking opportunity property
    > from the character they immediately surround, while keeping the prohibition
    > of linebreking between the parenthese/quotation mark and the inner character
    > it touches.
    For quotation marks, it's not possible to determine what the inner
    character is on the *character* level. See the fascinating discussion of
    this topic in the relevant chapter of the standard.
    > This would correctly handle the case of parentheses used to surround
    > ideographs, because parentheses should not be detached from the inner
    > ideograph, despite they should still remain breakable from the outer
    > character, as if these characters were absent from the text (so the line
    > break rule would treat parentheses and quotation marks as if they were
    > diacritics and part of a larger unbreakable grapheme cluster with the inner
    > character used as the effective base, and line breaking would be analyzed by
    > first ignoring them but just checking the breaking opportunities between the
    > inner and outer character.)
    The default way to handle parens in ideographic context need to match
    the common or legacy practice; that's at least the design point: to
    allow the default to reflect (reasonable) existing or legacy practice
    for a given script out of the box. A complicated re-analysis might be a
    basis for your own customization for a very sophisticated layout engine,
    but is misplaced here.

    Also, the class CM inherits from the *preceding* character. Your model
    would result in inheritance in the other direction, which would
    invalidate all existing implementations (not even those that import the
    UCD tables could update to such a scheme w/o changes in architecture).


    This archive was generated by hypermail 2.1.5 : Wed Jul 25 2007 - 16:01:01 CDT