Re: Ligatured characters

From: Peter_Constable@sil.org
Date: Tue Sep 12 2000 - 12:21:04 EDT


In practice, ligatures do not need to be directly encoded in Unicode. There
are a small number of ligatures, but they were only added because they were
necessary in order to support existing data that used legacy encodings that
included these ligatures. New data created using Unicode can ignore these
characters, and in most cases that would be the preferred practice.

Of course, you want to be able to present documents using ligated forms,
and so support for them must be provided somewhere. That is considered to
be the responsiblity of font and rendering sub-systems. Unicode assumes a
dichotomy between the sphere of *characters* and the sphere of *glyphs*,
and it is also assumed that systems that use Unicode will include
sub-systems that provide a mapping from characters to glyphs. Unicode is
intended to support distinctions that would be needed in plain text, i.e.
distinctions that pertain to the meaning of text. So, for example, the
meaning of a text wouldn't change according to whether a sequence of "fi"
is or isn't ligated. (One might respond saying that in Turkish there would
be a semantic difference, but that is not comparing apples with apples
since the ligated form would be a ligation of f with dotless-i. The example
would be rephrased as comparing ligated and non-ligated forms of <f,
dotless-i>.)

There is a short list of "smart-font" rendering technologies that are
coming on line: Apple Advanced Typography is available on the Mac OS, and
has existed in some form for many years now (though TrueType GX didn't
support Unicode). OpenType is beginning to make its way throughout
Microsoft's and Adobe's products lines. SIL's Graphite rendering system is
in the wings, and there has been considerable interest expressed in getting
this adopted in various platforms and environments. Any of these font
technologies are able to present character sequences such as <f, f, i> or
<c, t> using ligated glyphs.

The mechanisms for controlling the use of ligated forms can work in the
following ways:

- A font developer can specify that ligated forms are the default, in which
case applications don't need to explicitly request them.
- A font developer can provide ligated forms as non-default options; in
this case, applications need to explicitly request them, and the assumption
is that the app would typically provide a UI for users to specify where
they occur, and would store information about which sequences are to be
ligated along with other formatting information.
- For cases in which ligation plays an important role in the workings of a
script, as with Indic conjuncts, special Unicode characters (virama, ZWJ,
ZWNJ) can be used to control ligation; this may be extensible to other
special cases in which special ligation behaviour is needed for a
particular language in a script in which ligation is normally optional and
for typographic finesse only, such as Latin.

I hope this clarifies these matters for you. If you have questions about
the font technologies, you might want to post an inquiry on the OpenType
list.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT