# Re: SOFT HYPHEN

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Tue Nov 16 1999 - 08:42:40 EST

Klaus Weide wrote on 1999-11-10 12:27 UTC:
> I asume you are familiar with the dissenting treatise at
>
> http://www.hut.fi/~jkorpela/shy.html

I wasn't familiar with it, but a quick look at it tells me that I wish I
had written it myself. I like it very much and fully agree with Jukka

I really do believe that

- HTML documents should never contain soft hyphens. HTML formats
are unformatted and therefore should not contain characters such as
SHY that can only have been inserted as the result of a paragraph
formatting process.

- If people feel a real need to have control characters inside the text
that control hyphenation, then they could introduce a new ZERO WIDTH
HYPHENATION POINT, which would have a similar semantic as \-
under TeX (marking an explicit hyphenation opportunity in this word,
preferably also suppressing at the same time any implicit hyphenation
points that the hyphenation algorithm would otherwise provide).

May be there could be even both ZERO WIDTH HYPHENATION POINT and ZERO
WIDTH EXCLUSIVE HYPHENATION POINT, depending on whether its presence
is disabling the normal hyphenation algorithm for the remaining word
or not. (See also the \- in TeX versus the "- in the German.TeX
macro package, the latter of which is non-exclusive.)

- Inserting hyphenation points directly into a document in the
running text is usually a very bad idea, because it does not aid in
allowing to reformat the text later, it leads to inconsistent hyphenation
across a document, and it complicates search/replace algorithms.
The right solution is to allow the user to add to the document an
extension or exception list of the hyphenation dictionary for all
those words for which the default hyphenation algorithm leads to
unsatisfactory results. Similar to TeX's \hyphenation{Do-nau-dampf-
schiff-fahrt} command, which makes sure in the header that this
remarkably long word will be hyphenated correctly everywhere (!)
in the document, no matter how often it appears.

So I somewhat don't like the idea of adding a ZERO WIDTH {EXCLUSIVE}
HYPHENATION POINT to Unicode, because implementing it would probably be
abused as an excuse for not adding the only proper solution (hyphenation
exception lists). But even more I dislike the idea of simply abusing SHY
as an ill-defined ZERO WIDTH (EXCLUSIVE?) HYPHENATION POINT. See HTML. Yuck.

Markus

```--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
```

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT