From: Asmus Freytag (firstname.lastname@example.org)
Date: Fri Apr 02 2004 - 18:01:21 EST
> > non-breaking and non-stretching are presentational properties, not
> > semantic ones. They don't change the meaning of the space: it's still
> > just a space, not a hyphen or the letter "g". They don't affect
> > non-visual media; we don't break lines in spoken speech. "Louis XVI"
> > is semantically different from "Louis' head" because the former is a
> > bare noun whereas the latter is a noun phrase, but as far as the reader
> > is concerned, they're both separated with "a space". Whether the space
> > breaks or not or stretches or not has no effect on either the meaning
> > or correctness of the text. It only affects its (visual) aesthetic
> > quality.
This argument is misleading in one very important sense.
There are two senses of 'semantic' employed when discussing coded characters,
in particular, Unicode characters.
One refers to the (part of the) meaning of the text that is carried by the
character, in other words, how the semantics of a text are represented by
the character sequence in which it is encoded.
The other refers to the behavior that a character has in processing and
text. This sense is closely tied to the identity of a character.
For layout control characters and characters that have layout control features
associated with them, these senses can intersect and overlap in interesting
Think of the example of SHY (soft hyphen), used to mark possible hyphenation
points in a word. A while ago we had a discussion on this list where there was
an interesting minimal pair of German compounds:
Wachs|tu-be (tube of (or made of) wax)
Wach|stu-be (guard room)
The word boundary (which is also an hyphenation point) is marked as |, a
hyphentaion point is marked with -. In other word, each word has two SHYs
but not both in the same location.
I can remove the SHYs from these words, and if the text is not broken
at that point, its semantic for the human reader doesn't change. With
text is unambiguous, but if there isn't enough context, the text is clearly
However, equally clearly, by leaving the SHY in the text, it is (in its
representation) entirely unambiguous, even if that semantic difference is not
surfaced to the reader (except if a line break fortuitously happens to be
in the first half of the word).
Of course a (good) screen reader could pick up on the difference and split the
compound correctly when pronouncing it.
This archive was generated by hypermail 2.1.5 : Fri Apr 02 2004 - 18:44:17 EST