RE: Latin ligatures and Unicode

From: Asmus Freytag (
Date: Wed Dec 29 1999 - 06:16:34 EST

At 04:28 PM 12/28/99 -0800, Kenneth Whistler wrote:
...Simply because there is a need for control of representation
> of ligatures in rendered text does not imply that encoding a ZWL
> [zero width ligator] or ZWNL [zero widht non-ligator] (or both) *as* a
> character is required to do so.
>People can ask for new ligatures and variants. If they ask them of
>the appropriate vendors -- the developers of fonts, particularly those
>using the technology Mark and John are talking about -- then there
>is a reasonable chance they will get what they are asking for. If,
>however, they besiege character encoding committees asking for ligatures
>and variants to be encoded *as* characters, then they won't.

I don't see this as something to go and ask individual vendors for. I will
argue that is an issue for the entire vendor community (as represented in
Unicode, but also W3C).

What is at the heart of this recurring request is that support for many
(or older typographies) is incomplete without an *interchangeable*
method of indicating the precesence or absence of ligatures.

Plain text used to be the *only* medium with near universal
interchangeability. With the web, this has changed. It is now appropriate
to move this discussion on a higher plane and consider the question

What is the best way to interchange text containing ligature on the web?

Posing this question allows us to consider the full-featured typorgraphic
and aesthetic requirements for ligation - as well as any inherent
regularities. Once we have a design in place for interchanging ligatures
with marked up text, we can revisit that and see whether replacing markup
instructions by character codes gives better results.

I feel we have explored the semantic aspects of this long enough to
conclude that there is some evidence that a ZWNL is linked slightly more to
the underlying semantic content of the text than a ZWL, but that for
neither case we have enough to settle the argument in favor of making them
characters today.

Both concepts ('ligate here', 'don't ligate here') can in principle be
expressed with HTML or XML style markup - I have seen too little discussion
of what this markup should be like, and what the consequences are of it
being present in the middle of words. Is that something that the HTML/XML
community wants to deal with?

The next question, assuming that we agree on what ligation commands look
like in markup, concerns interchange between parts of a program, e.g. text
processor to rendering engine. Is it meaningful to have character codes at
that level, or is it more typical that each ligature is it's own little
style run.

The strongest arguments in favor of character codes come from those who
have for long time needed to 'trick' various applications into supporting
that they were not explicitly designed for. If character codes would result
in 'enabling' many of these implementations, by letting the author
communicate with the rendering engine, so to speak, that is itself a valid
argument to consider. (It would need some actual case studies where this
approach is shown to work).

Still, even that would need to be contrasted with the cost to applications
that do not know about these as characters and end up showing 'boxes'.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT