Re: U+2026 break behaviour

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Mon Sep 05 2005 - 01:04:17 CDT

  • Next message: Doug Ewell: "Re: U+2026 break behaviour"

    Doug Ewell wrote:

    >Anto'nio Martins-Tuva'lkin <antonio at tuvalkin dot web dot pt> wrote:
    >
    >
    >
    >>Microsoft Internet Explorer 6 has rendered the sequence U+0021 :
    >>EXCLAMATION MARK and U+2026 : HORIZONTAL ELLIPSIS with a soft line
    >>breake in between. Is this the expected behaviour? At least it doesn't
    >>happen that way with U+0021 U+002E U+002E U+002E...
    >>
    >>
    >
    >Well, obviously that's not the expected behavior,
    >
    It is - as far as Unicode is concerned. The line breaking classes of
    U+0021 and U+2026 are EX and IN, respectively. Although IN is short for
    "inseparable", this really means that characters in this class are
    inseparable from other characters in the class and from some other
    characters by special rules. The line breaking rules in UAX #14
    involving IN are LB 16 (preventing a break between AL, ID, IN, or NU and
    IN) and LB 18 c (preventing a line break between Korean syllable block
    and IN).

    A quick check from Table 2 of UAC #14 also shows that between EX and IN
    there is "_", a direct break opportunity. (Whether this is a good thing
    or not is a different issue.)

    HTML specifications and browsers do not claim conformance to the Unicode
    Standard, though. (The so-called document character set of HTML is
    defined in terms of Unicode, but this really means only that character
    references of the form &#decimal; and &#xhexadecimal; are interpreted by
    mapping the numbers to Unicode code points. There is no requirement that
    processing of characters take place by Unicode rules.)

    IE and some other browsers have started applying some of the Unicode
    line breaking rules, but this is regarded by many as a problem rather
    than a useful thing, especially since browsers apply the rules
    indiscriminately. Such behavior is described on my page
    http://www.cs.tut.fi/~jkorpela/html/nobr.html

    >and in fact the
    >following page looks just fine to me on Internet Explorer 6.0.2800.1106
    >under Windows Me:
    >
    >http://users.adelphia.net/~dewell/bang-ellipsis.html
    >
    >
    >
    It exhibits the behavior described. When you make the browser window
    narrow enough, a line break will appear between the exclamation mark and
    the horizontal ellipsis (e.g., in the heading).

    The original message from Anto'nio Martins-Tuva'lkin says that it is a
    forwarded message, quoting a message posted to the list on October 22,
    2004. I was unable to find such a message in the archives. Does someone
    know what is going on? (There are also other "forwarded messages" posted
    recently.)



    This archive was generated by hypermail 2.1.5 : Mon Sep 05 2005 - 01:05:39 CDT