RE: UAX#14-20: undesriable line breaking opportunities (parenthese and quotation marks)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jul 26 2007 - 02:25:05 CDT

  • Next message: Philippe Verdy: "RE: UAX#14-20: undesriable line breaking opportunities (parenthese and quotation marks)"

    Asmus Freytag wrote:
    > With all changes to UAX#14, it is important to make sure that existing
    > implementations can continue to be conformant as much as possible,

    I've never wanted to propose a change that would make existing
    implementation non-conforming.

    I just signal the common fact that in very common cases like optional
    suffixes/prefixes/infixes between parentheses attached to a word, breaking
    it by default around the parentheses is completely undesirable (and infact
    not needed, given that parentheses just need to break a line at their
    nearest word-separation whitespace (the only exception being for scripts
    like Han that allow line breaking between most ideographs, and where no
    space will be present before the opening parenthese or after the closing
    one).

    > Your furhter
    > statements show that you haven't fully understood some of the core
    > concepts of the way the default line breaking algorithm is intended to
    > work.

    Please avoid your putative statement about my understanding of the line
    breaking algorithm. That's not necessary, and you are trying to insult me
    with such things. I can read things, but if I need to specify everything for
    a desired changed, I would not need to discuss it here.

    In fact the most important thing is that I indicate a real problem, that is
    currently not handled, and some suggestions must be made to solve it, before
    an effective technical specification can handle it.

    I have not suggested anything in such a way that would break the line
    breaking algorithm. (Note anyway that if the existing algorithm already
    inserts a line-breaks within a single word that just appears to embed
    parentheses, then something must be changed if this line breaking is
    undesirable.

    I have just not suggested the effective technical rule needed to handle the
    case of undesirable line breakings around punctuation pairs like parentheses
    and quotation marks.

    Please reconsider this problem with the VERY COMMON example of optional
    plural forms like in:
    "one or more word(s)"
    And consider the equivalent cases that DO occur in almost all Latin-written
    languages (I gave examples in French, I prove that this case exists too in
    English, it's very easy to find many examples in German, Spanish or Italian,
    and in fact this is not restricted to Latin-based European languages, and
    you'll find the same cases in Greek, Russian, Hebrew...)

    Then consider the various discussions that have happened here about the
    transciptions of Hebrew into Latin: the parentheses are commonly used in the
    middle of a word to surround missing/implied/deleted/optional letters.

    Look at the many discussion in this list where characters within the same
    word need to be transcribed using some notation between parenthese-like
    pairs...

    I am just demonstrating that this case is VERY FREQUENT but still NOT
    HANDLED correctly in MOST cases.

    For this reason, in many web pages if we want to avoid undesired line breaks
    in narrow table columns, we currently have to surround these unbreakable
    words with CSS style like:

            <html>
            ...
            One (or more) <span style="word-space:nowrap">word(s)</span>
            ...
            </html>

    The fact that we have to do this BREAKS the separation model between the
    content and the style. And this is not always possible in many cases where
    the text is assembled from several sources that are NOT HTML-encoded. And
    this won't work with texts in XML sources that don't have any style option
    in the schema of their data-model.

    So please don't rant against me, I've been polite for now. What I am
    indicating is not a very special case, and the line breaking algorithm
    should be able to manage the most frequent cases. This case with parentheses
    is VERY FREQUENT.



    This archive was generated by hypermail 2.1.5 : Thu Jul 26 2007 - 02:28:49 CDT