Re: Latin ligatures and Unicode

From: Eberhard Pehlemann (
Date: Sun Jan 02 2000 - 11:33:59 EST

I have some comments and questions concerning the long s, as it has been
mentioned several times during the last two weeks:

1. Why has U+017F [LATIN SMALL LETTER LONG S] been introduced as a character of
it's own? (My question from 19.12.1999 has not yet been fully answered.)

As U+017F [long s] and U+0073 [LATIN SMALL LETTER S] are defined as two
different characters, there must be a characterization of their different
meaning, I suppose.

2. What is the semantics of U+017F [long s]? Is there any statement by the

U+017F [long s] has been suggested as a means to trigger ligation behavior. At
03:59 PM 12/28/99 -0800, Kenneth Whistler wrote:

> > Wac&hstube and Wac&hF&tube
> > (where F in the 2nd example is the long s and & is the ZWL)
> Wait just a dang minute! If you *encode* this text using a long
> s in the second case, then the boundaries and ligatures *are*
> predictable, and you can dispense with the ZWL's.

I would like to know if U+017F [long s] is a good candidate for this job, and in
general, if it should be used or avoided. Suppose that U+017F [long s] is used
in a german text (using a Fraktur font which contains a glyph for the long s and
also glyphs for all ligatures).

3. Will U+017F [long s] be displayed with the glyph of U+0073 [short s] if the
text is displayed in a roman (Antiqua) font and that font does not contain a
glyph for U+017F [long s] ? (Or will there appear an empty rectangle or
something else denoting a missing character?)

4. Will a german spelling checker that knows the word "Wachstube" recognize the
word "Wach<U+017F>tube" and accept it?

5. Will a hyphenation engine handle the word "Wach<U+017F>tube" correctly (which
means: hyphenate it as Wach-stu-be)?

(If at least one of the last 3 questions must be answered with "no", I woulkd
agree with Asmus Freytag and decide that U+017F [long s] should never be used in
the depicted context.)

During the discussion about latin ligatures I have learned a lot about the ideas
of ligature handling by rendering engines, operating system and application
software. If automatic or semi-automatic ligation shall be set up for Fraktur
typesetting, the usage of U+017F [long s] could make the implementation of
ligation rules much easier. I am thinking of the following rules, e.g.:

The character sequence <U+017F><c><h> must always be rendered with a <long
s-c-h> ligature.
The character sequence <U+0073><c><h> must always be rendered with a <short s>
plus <c-h> ligature.
The character sequence <?><c><h> (with <?> neither <U+017F> nor <U+0073>) can
almost always be rendered with <?> plus <c-h> ligature.
The character sequence <U+0073><t> must never be rendered with a <long s-t>
The character sequence <U+017F><t> must always be rendered with a <lonh s-t>

So I really would like to know if the use of U+017F [long s] is recommended or

Thank you, Eberhard


Eberhard Pehlemann, Dorfstraße 7, D-23909 Giesensdorf, Germany, Tel. +49 4541

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT