Re: UAX#14-20: undesriable line breaking opportunities (parenthese and quotation marks)

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Fri Jul 27 2007 - 00:34:58 CDT

  • Next message: Philippe Verdy: "RE: UAX#14-20: undesriable line breaking opportunities (parenthese and quotation marks)"

    On 7/26/2007 2:28 AM, Philippe Verdy wrote:
    >
    >> -----Message d'origine-----
    >> De : Philippe Verdy [mailto:verdy_p@wanadoo.fr]
    >> Envoyé : jeudi 26 juillet 2007 09:39
    >> À : 'Kenneth Whistler'
    >> Cc : 'unicode@unicode.org'
    >> Objet : RE: UAX#14-20: undesriable line breaking opportunities (parenthese
    >> and quotation marks)
    >>
    >>
    >>> And in particular, the relevant rules are:
    >>> (...)
    >>> LB30 Do not break between letters, numbers, or ordinary symbols and
    >>> opening or closing punctuation.
    >>>
    >>> (AL | NU) × OP
    >>> CL × (AL | NU)
    >>>
    >>> Those rules seem *already* to be doing exactly what you seem to
    >>> be asking for.
    >>>
    >
    > If you really think that this rules are sufficient,
    It's not just Ken who thinks so.
    > I still maintain that
    > this rule is ambiguous, and consists in fact into TWO separate rules that
    > are incorrectly summarized by its description (the term "between" combined
    > with the "or" used in "opening or closing" is the main source of confusion).
    >
    > So I am suggesting to rewrite it as:
    >
    > LB30.1 Do not break after letters, numbers, or ordinary symbols
    > and before opening punctuation.
    >
    > (AL | NU) × OP
    >
    > LB30.2 Do not break after closing punctuation and
    > before letters, numbers, or ordinary symbols.
    >
    > CL × (AL | NU)
    >
    In contrast to other parts of the standard, the *text* of these rules is
    merely a standard English approximation of the formal statement of the
    rule given in the formula. Your rewrite might be clearer, but it doesn't
    change the rule as such.

    Also, none of the rules can be taken in isolation. For example, the
    behavior of ID OP is described in rule 31, because no earlier rule
    contains the sequence ID OP. You can see the effect of this chaining by
    hovering over the cells in the pair table in section 7.3.
    > And I would add a third item speaking about punctuations that may be used
    > both as opening or closing punctuation, either because this is
    > language/locale dependant (notably quotation marks), or because they are
    > intrinsicly ambiguous (such as the ASCII vertical single or double quotes).
    >
    I would not recommend that this should be considered, because it is
    already quite well covered by rule LB19.
    > In such a case, if it can't be determined (from the character itself or from
    > the language effectively in use) that a punctuation is opening or closing,
    > then the two separate rules should BOTH apply, by making these punctuation
    > signs parts of the TWO line-breaking classes OP and CL.
    >
    See LB19.
    > Now, about the implementation :
    > * for closing punctuations it is simple to handle this case by treating it
    > as if they were combining characters encoded after the combining sequence
    > that it extends so that it is handled as if it was a larger grapheme
    > cluster. This should occur in all cases except after whitespaces and
    > explicit line-break controls (or explicit ends of verses if they are marked
    > as such in some scripts, such as double dandas).
    > * for opening punctuations, the case is a bit more difficult because it will
    > require an additional forward lookup to see how to handle them.
    > * for ambiguously opening or closing punctuations (mostly, the quotation
    > marks discussed above), the best way to handle them is to prohibit line
    > breaks BOTH before AND after them, unless the characters before or after
    > them are whitespaces or characters explicitly forcing a line-break or
    > indicating explicitly that a line break is allowed, such as a disjoiner
    > control.
    >
    >
    The beauty of the rules as written is that no special forward lookup is
    required, instead a single mechanism works for all characters.

    A./
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Jul 27 2007 - 00:36:49 CDT