Re: Special Type Sorts Tray 2001

From: Peter_Constable@sil.org
Date: Fri Oct 05 2001 - 15:13:02 EDT


>Indeed they would appear as a black box or something similar in most
fonts.
>However, I feel that the availability of ligatured characters in a font
at a
>specific official Unicode code point would be useful for the specific use
of
>a person to be able to encode the ligature information directly, so that
he
>or she may transcribe the typography of an eighteenth century printed
book
>directly "metal type sort to unicode character" and print out the text.

Yes, this kind of thing should be possible, but in rich/styled text, not
in plain text. Similarly, there are people out that that would like to
encode in electronic form manuscripts etc. but that is also more than
should be expected of plain text.

>but I do feel
>that the ligatured character facilities should be available for use in
>appropriate circumstances.

Sure, but do those circumstances really require plain text?

> I feel that as their usefulness was such that
>ligatured characters could be cast in some fonts in metal type right up
>until the end of the mainstream use of metal type, then it is reasonable
>that the use of such ligatured characters could be continued indefinitely
>into the future using unicode. There may well be uses in desktop
publishing
>for the typesetting of various decorative items.

And that can be done using technologies that are starting to become
mainstream without requiring that those decorative items be directly
encoded.

>running on Windows 95, that will continue to be in use for many years. As
>far as I know, Word 97 can only use ligatured characters such as ct if
they
>are (1) encoded in a font and (2) the character is inserted into the
>document whenever required using Insert Symbol or using a short cut key
set
>up from within Insert Symbol.

Other methods for insertion are possible (e.g. you could use Keyman to
create an input method), but as far as encoding and fonts to work with
Word 97, that is correct. But it should not be a requirement on any
technology that advanced functionality be automatically supported on older
products. That is simply too costly, and often just not possible. Do you
expect the PC you bought in 1996 (probably a 200Mhz machine with 64K and a
2GB drive) to do digital video editing? Probably not. Similarly, we
shouldn't necessarily expect our 1996 software to support functionality
being developed today.

You will likely respond that the comparison isn't valid because it would
be technically very simple for Word 97 to handle a ct ligature if it was
just encoded in Unicode. That's true. But the ct ligature is just a drop
in a very big bucket that involves a number of complicating factors that
aren't being considered. For example, one user creates a document that
contains "Wellington's victory over Napoleon" using a ct ligature, and
another user creates a different document on the same topic but doesn't
use a ligature. Then a third user is trying to retrieve documents on the
topic and knows that there are two documents out there but has no idea
that one or the other might use a ligature and so be encoded differently.
They just search for "victory", and they only get half the results they
were expecting returned to them. Multiply this problem by the untold
hundred or thousands of different ligatures that might possibly be
included. In addition to this data retrieval scenario, consider various
kinds of text support functionality, like case mapping or spell checking:
how can someone write algorithms to deal with that decently when next
month there may be new ligatures in the standard creating
geometrically-increasing options for how things can be represented? They
can't. So, then, the logical question is whether a normalisation should be
defined that fold the distinction between the two forms of "ct" (and
likewise for all the other ligatures that have been added). But then we
still have the problem that software can't be designed with any hope of
stability since we have no way of knowing what new ligatures (and hence
new normalisations) might need to be supported tomorrow.

In the big scheme of things, there is a simpler solution: allow ligatures
to be handled using advanced typography technologies, which are being
deployed *anyway* to support scripts like Arabic, etc. The cost is that
these ligatures are not supported in Word 97, but in 10 years virtually
nobody will still be using Word 97 anyway. On the other hand, the
normalisation problems the other approach would create would still be with
us 65 years from now. Asking for something to work in Word 97 is being
somewhat short-sighted.

>Perhaps many people will have seen open access
>rooms in colleges where there are a number of newer machines and then
>gradually as one moves to the end of the room there are all sorts of
older
>machines with older software being fully utilized by students preparing a
>paper.

I appreciate the concerns regarding older systems (e.g. Word 97). I have
to deal with that as I try to support people in our organisation around
the globe since they generally don't have budgets that allow them to
update systems very often, and they also have to work with local
colleagues who are on even tighter budgets. As a result, I have assumed
that I need to find solutions for people to work with non-Roman scripts
that will work on down-level systems. It is the case that, if we can
accomplish that, there will be people who benefit from it. I've been
surprised, though, to learn that a lot more will update their systems than
I expected -- they'll do it if there is a reason for them to do it. Once
we have apps that work with Unicode and advanced typography technologies
that will allow them to work with the non-Roman scripts they use, they'll
make the change to Windows XP and Office.Net if that's what it will take
to obtain that functionality. So, the fact that there are still machines
out there with Word 97 installed doesn't keep me from reaching the
conclusion that it's OK if support for things like ligatures and other
aspects of advanced typography only work on newer systems.

>I would mention in passing, for completeness, the possibility of having a
ct
>character as a bitmap

Yes, that would be more awkward than it's worth. You can always create a
symbol-encoded font that contains the ligatures you want to use. (On a
Windows system, that will in effect encode them in the PUA range f020 - f0ff; that range will be shared with other symbol sets, which nobody
here will object to. Note, however, that you'll lose some functionality --
you won't be able to spell check, for example.)

>A designating of certain characters as being quaint
>characters might perhaps be a way out of the problem and that thus ct and
>various long s ligatures could be defined as quaint characters such that
>they have unique official unicode positions yet are outside of regular
usage
>where database sorting might be needed. Does that solve the problem of
>including them as presentation forms?

IMHO, no, and I'm inclined to respond that I haven't been at all convinced
of the problem that including them in the presenation forms is expected to
solve. The fact that Word 97 can't support a ct ligature without it being
directly encoded is not IMHO a serious problem, whereas the potential
implications of including a ct ligature (and others) in Unicode are.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>



This archive was generated by hypermail 2.1.2 : Fri Oct 05 2001 - 14:04:06 EDT