Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri May 30 2003 - 17:49:26 EDT

  • Next message: Michael Everson: "Re: When do you use U+2024 ONE DOT LEADER instead of U+002E FULL STOP?"

    Philippe Verdy vamped:

    > > > For example I would not be shocked if a text using it was rendered with
    > > > a monospaced font, where the base line of the character cell shows
    > > > multiple tiny dots, that create a contiguous dotted line when multiple
    > > > U+2024 characters (one per display cell) are used to indent the text in
    > > > columns.
    > > >
    > > > Of course with proportional fonts this character would display at least
    > > > (and preferably) a single dot. Any use of this character that assumes
    > > > it is a symbol consisting in a single dot aligned on the baseline seems
    > > > to abuse the semantic of this character, which is not a punctuation,
    > > > but really a styling character used instead of an "invisible" thin
    > > > space.
    > >

    And Jim Allan asked:

    > > Where is this behavior indicated by Unicode specifications?
    > >
    > > Such behavior appears to me to be a non-standard extension on Unicode,
    > > interpreting what Unicode classes as a General Puncutation character as
    > > instead a Formatting Character.

    > > But I don't see how conforming aplications could assume this semantic
    > > for the character when reading in plain text Unicode or writing plain
    > > text Unicode.
    > >
    > > What then is U+2025 TWO DOT LEADER?

    And then Philippe Verdy continued to improvise:

    > For me this one is a punctuation, commonly used to designate
    > a separator between bounds of intervals like [0..1] (it is
    > generally surrounded by a thin space on both sides with strict
    > typography). It should not be used to create arbitry lengths
    > of leaders.

    What he is talking about here is generally represented by
    the sequence <U+002E, U+002E>, in other words, just two
    full stops, as in the example given "[0..1]". Typographical
    rules then deal with any issues of spacing around or between
    the dots.

    >
    > The three dot leader is also a punctuation (normally not
    > prefixed by any space, but followed by a large space like
    > for the full dot). It should not be used to create arbitry
    > lengths of leaders.

    This is a reference to U+2026 HORIZONTAL ELLIPSIS, and Philippe
    is correct that that should not be used to create arbitrary
    leaders.

    > The one-dot leader should have no other purpose than to be
    > used in sequences of arbitrary length.

    This statement is only very accidentally true. Explanation
    below.

    > The whole sequence of single-dots leaders like this forms a
    > single token with the semantic of a word separator, where the
    > number of displayed dots is not really relevant for the reader
    > of text whatever is rendering style or fonts.

    But this is absolutely false, as Jim Allan suggested.
    U+2024 ONE DOT LEADER is a graphic character, whose glyph
    consists of a small baseline dot, and whose General Category
    is Po (Other Punctuation). It cannot be used conformantly as
    if it were a formatting control standing in for a rich text
    representation of a leader object (e.g. in a generated
    Table of Contents in a Word or FrameMaker document).

    > I just think that this 1-dot leader is used as a way to transcode
    > within a single string what was initially a tabulation decorated
    > by some markup system,

    False.

    Now, here is the true story of U+2024.

    It is a compatibility character, introduced for compatibility
    with XCCS (Xerox Character Code Standard) 1980, where it
    was mapped to the coded character 356B/242B (0xEEA2),
    described as "Leader, one-dot on an en body".

    Its use in XCCS would have been to create leaders manually,
    by lining up a sequence of "one-dot on an en body" to create
    a sufficiently long leader. Its rationale in Unicode would be
    to either map to data created in XCCS or to manually lay
    out text using a comparable mechanism, but for which one wished to
    distinguish the "dots" thus used from U+002E FULL STOP.

    U+2025 TWO DOT LEADER is also an XCCS compatibility character.
    It corresponds to XCCS 356B/243B (0xEEA3) "Leader, two-dot
    on an en body" *and* to 041B/105B (0x2145) "Leader, two-dot
    on an em body". The difference in width was considered
    a formatting distinction and was unified away in creating
    the U+2025 encoded character, as preserving that distinction
    in plain text was considered unnecessary by the Xerox
    representative to the committee at the time.

    U+2026 HORIZONTAL ELLIPSIS maps to the ellipsis seen in a
    number of legacy character encodings, including the Macintosh
    character sets, but also maps to an XCCS character: 041B/104B
    (0x2144) "Leader, three-dot on an em body".

    All *three* of these characters should be considered
    compatibility characters. Indeed, they formally *are*
    "compatibility decomposable characters" (Chapter 3, Definition
    D21), since they each have compatibility decompositions
    to one or more U+002E FULL STOP characters.

    That last fact should be taken as a hint that for most
    purposes, manual leaders should just be sequences of FULL STOP
    characters (as you will see, for instance in the plain text
    representations of Internet Drafts or RFCs, for example).
    But in any rich text format, leaders are styled formatting objects
    (somewhat similar to tabulations, as Philippe suggested), but
    that does *not* make U+2024 a format character (LEADER
    PLACEHOLDER, or whatever). It is exactly what it claims to
    be: a compatibility character, punctuation, with a single
    baseline dot as its glyph.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri May 30 2003 - 18:31:08 EDT