Re: [css3-text] New Working Draft

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Apr 20 2011 - 21:58:14 CDT

  • Next message: fantasai: "Re: [css3-text] New Working Draft"

    I disagree, because it breaks the inherent nature of the script. Joins
    in Arabic are mandatory, and create "super grapheme clusters".

    When you say that « it does not consider morphemic, syllabic, or other
    boundaries », this is already wrong because it already considers the
    default grapheme cluster boundaries. Note that the default grapheme
    boundaries were designed only to be locale neutral. But here we are
    speaking about localization where the language and its script will
    matter, including in its fundamental properties. Joining types in
    Arabic are key parts of the script.

    But in the previous part of the specification, nothing speaks about
    them, and all what is left on the upper levels where trying to find
    language-correct boundaries will fail. After this level, there shoudl
    still be a level related to the script itself (independantly of the
    language), before trying the last-chance "emergency" breaks. This
    intermediate level can still be prioritized, just as it was in the
    previous steps.

    Otherwise, chances are very high that even the exepected joining types
    wil not even be rendered with the expected shape, and there will be
    incorrect rendering of other elements in the now broken join, i.e.
    characters that are not starters of default grapheme clusters.

    It won't be worse even if it is not strictly a morphemic or syllabic
    break. And in most cases, it will produce at least a correct syllabic
    break, even if there was no morphemic analysis nor just syllabic
    analysis (because this step is optional and much more complex to
    implement). The joining analysis for Arabic is at least very simple to
    compute (and fully standardized for the Arabic script, without any
    linguistic knowledge).

    And yes, even in that case you could still insert the hyphenation
    symbol to show that the word was effectively broken (it is common
    practice to insert it, even in the Latin script and even if this is
    not the preferred syllabic or morphemic break position, which can only
    be infered by language specific rules and a lookup dictionnary for
    handling many exception cases).

    The hyphenation symbol is generally very narrow, and if needed, it
    cans still overflow a bit in the margin. I've never seen any practical
    case where it could not be inserted, even in the narrowest columns of
    a table, the only exception being when rendering with monospaced
    fonts, with minimal column separation not larger than a thin space
    (there should always be some minimal gap between columns of text, and
    a small compression (kerning, glyph stretching) is still possible when
    those characters already contain some inner advance gaps on both
    sides, at least for the hyphenation symbol itself.

    Note that overflow in the padding area does not cause this hyphen to
    be completely invisible, even if the overflow is set to hidden.

    The only case where it would not appear is when rendering on a
    monospaced grid of a text terminal, where column separation is only
    marked by distinct colors, or dictinct style attributes (bold, italic,
    blinking, underline/overline/overstrike decorations...), and the
    column is reduced to only one "character" (more precisely a single
    glyph for the complete default grapheme cluster).

    The choice of the hyphenation symbol is also a property of the script.
    In many East and South-East Asian scripts, there's not even any symbol
    for that, because break can occur between all grapheme clusters.

    Note: in Indic scripts, the danda or double-danda punctuations should
    be treated like the commas and stops in your spec and preferably not
    left alone on the next line, even if it falls within the margin (you
    showed cases for East-Asian scripts only : Han, Hiragana, Katakana,
    Hangul, Bopomofo, Yi, Mongolian...)

    But the same rule could as well apply to other "narrow" punctuations
    used in Indic or European scripts such as the colon, semicolon,
    exclamation mark, or single quotes that do not follow a non-breaking
    space). The available margin at end of line typically accepts to fit
    these punctuations in case of emergency situations, even if this makes
    the margin slightly unaligned.

    Philippe.

    2011/4/21 fantasai <fantasai.lists@inkedblade.net>:
    > On 04/20/2011 04:47 PM, Philippe Verdy wrote:
    >>
    >> [css3-text]:
    >>
    >> "7.2. Emergency Wrapping: the ‘word-wrap’ property
    >> [...]
    >> break-word
    >>   An unbreakable "word" may be broken at an arbitrary point if there
    >> are no otherwise-acceptable break points in the line. Shaping
    >> characters are still shaped as if the word were not broken, and
    >> grapheme clusters must together stay as one unit.[...]"
    >>
    >> Here I also suggest that contextually shaped characters should not
    >> just keep their normal shaping, but the joining types should be taken
    >> into account, to avoid breaking between joined character pairs, with a
    >> higher precedence for disjoined characters.
    >
    > Actually, the fact that the join is broken has the advantage of making
    > it more clear that this is an improper wrap. It is /better/ to break
    > there than at a disjoint boundary.
    >
    > The purpose of "word-wrap: break-word" is to handle emergency cases,
    > where there are no other breakpoints. It does not insert hyphens. It
    > does not consider morphemic, syllabic, or other boundaries. It just
    > breaks somewhere arbitrary to avoid overflow.
    >
    > So I disagree with your suggestion and believe the spec is correct
    > as it stands.
    >
    > ~fantasai
    >



    This archive was generated by hypermail 2.1.5 : Wed Apr 20 2011 - 22:01:05 CDT