Re: Why Ligatures?

From: Kenneth Whistler (
Date: Wed Oct 15 1997 - 18:43:40 EDT

> I guess I don't understand the reasoning behind having ligatures in a
> codepage. For example, Latin 00 proposal wants to add OE and oe
> ligatures to Latin-1.

Whatever its other flaws, the "Latin 00" proposal (which really should
be "Latin-9") is not mistaken about the OE and oe characters. These
are proposed to be added as characters, in support of French, in the
same way that AE and ae characters are present in support of data
representation for Scandinavian languages.

> Aren't these cases that the display logic should handle, but the data
> actually is 2 letters? Is there a semantic difference if the display
> showed 2 characters?

Yes. Just as for ae in Scandinavian countries.
> When I search for oe ligature code value,
> shouldn't I match with the octet pair o & e?

That depends. The issue of searching can be mapped to the issue
of collation. What is or is not equivalent depends on language-specific
collation rules. For English, U+00E6 LATIN SMALL LETTER AE is
treated as equivalent to a sequence of a+e. However, that is not
the case for Swedish or Danish. Hence, whether you match or not
needs to be tuned to the language of the data and the requirements
of the search.

The issue for U+0153 LATIN SMALL LIGATURE OE is effectively the same.
Don't be misled by the difference in names. If you look at the
fine print in the Unicode Standard, Version 2.0, you can discover
that the names "LETTER" and "LIGATURE" for ae and oe were reversed
between Unicode 1.0 and Unicode 1.1, to match national committee
requirements for 10646. A lot of vituperous argumentation about
whether either or both were actually letters or ligatures accompanied
these changes, but in both cases we have effectively the same
real situation:

   Each is an encoded character in Unicode/10646.
   Each may be a single orthographic unit in one or more languages.
   Each may be equated to a sequence of characters (i.e. a+e or o+e)
       for collation in one or more languages.


> Thanks
> Bernard Chester
> EDM Localization Coordinator, FileNET Bellevue
> 425-450-1479

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT