Re: Latin ligatures and Unicode

From: Glen Perkins
Date: Tue Dec 28 1999 - 15:43:35 EST

From: <>

> As I see it, what is considered as a glyph or font variation in one
> or context (e.g. modern prose) could be considered as a significant
> difference in another language or context (e.g. ancient poetry).
> In the cases when these differences become significant, there should be a
> way to encode them in plain text,

Certainly there are mystery novels whose dialog includes italics in order to
capture more of the semantics of the spoken utterances. Without those
italics you would lose meaningful clues to the mystery that the author needs
to convey to the reader. The whole mystery could hinge on which word is
italicized in a particularly meaningful utterance that would not have the
same meaning without italics. Italics in an English magazine article could
also make it possible to refer to the French /chat/ without causing the
reader to misinterpret it as the English chat. The italics aren't used in
these cases for esthetics, they're used to make the writing comprehensible.
Should italic markup therefore be added to Unicode? (I'm not proposing it,
by the way.)

It seems likely that there will be cases where ligatures will be
semantically significant -- examples that are even stronger than any we can
come up with artificially in the course of a debate. The question becomes
one of where Unicode draws the line between the semantically significant
information it includes, and the semantically significant information it
leaves to higher-level markup or automated systems. It's not clear to me --
and apparently I'm not alone -- where that line is, especially given the
number of "ZERO-WIDTH X" (not to mention a whole plane of language tag)
markup characters already in Unicode.

Wherever that line is drawn, though, the first thing excluded will be able
to make a pretty good case that it's essentially the same as the last thing
included. If that argument (e.g. "this contains semantic info as important
as hyphenation") is sufficient, then a lot more markup is coming, for better
or worse.

Glen Perkins

