Re[2]: Fixed Width Spaces (was: Printing and Displaying DependentVowels)

From: Asmus Freytag (
Date: Fri Apr 02 2004 - 18:01:21 EST

  • Next message: Ernest Cline: "Re: New Currency sign in Unicode"

    Somebody wrote:
    > > non-breaking and non-stretching are presentational properties, not
    > > semantic ones. They don't change the meaning of the space: it's still
    > > just a space, not a hyphen or the letter "g". They don't affect
    > > non-visual media; we don't break lines in spoken speech. "Louis XVI"
    > > is semantically different from "Louis' head" because the former is a
    > > bare noun whereas the latter is a noun phrase, but as far as the reader
    > > is concerned, they're both separated with "a space". Whether the space
    > > breaks or not or stretches or not has no effect on either the meaning
    > > or correctness of the text. It only affects its (visual) aesthetic
    > > quality.

    This argument is misleading in one very important sense.

    There are two senses of 'semantic' employed when discussing coded characters,
    in particular, Unicode characters.

    One refers to the (part of the) meaning of the text that is carried by the
    character, in other words, how the semantics of a text are represented by
    the character sequence in which it is encoded.

    The other refers to the behavior that a character has in processing and
    text. This sense is closely tied to the identity of a character.

    For layout control characters and characters that have layout control features
    associated with them, these senses can intersect and overlap in interesting

    Think of the example of SHY (soft hyphen), used to mark possible hyphenation
    points in a word. A while ago we had a discussion on this list where there was
    an interesting minimal pair of German compounds:

    Wachs|tu-be (tube of (or made of) wax)
    Wach|stu-be (guard room)

    The word boundary (which is also an hyphenation point) is marked as |, a
    hyphentaion point is marked with -. In other word, each word has two SHYs
    in it,
    but not both in the same location.

    I can remove the SHYs from these words, and if the text is not broken
    across lines
    at that point, its semantic for the human reader doesn't change. With
    context, the
    text is unambiguous, but if there isn't enough context, the text is clearly

    However, equally clearly, by leaving the SHY in the text, it is (in its
    representation) entirely unambiguous, even if that semantic difference is not
    surfaced to the reader (except if a line break fortuitously happens to be
    in the first half of the word).

    Of course a (good) screen reader could pick up on the difference and split the
    compound correctly when pronouncing it.


    This archive was generated by hypermail 2.1.5 : Fri Apr 02 2004 - 18:44:17 EST