Re: various stroked characters

From: William Overington (
Date: Fri Sep 06 2002 - 02:28:45 EDT

Peter Constable wrote as follows.

>Well, only that it would make it easier for us to avoid proliferation of
>fonts while also providing fonts that our users can use with current
>software that does not yet provide means of selecting glyph alternates.

Would the encoding that would be intended to be used in the long term use of
Unicode be to use one of the characters from the range U+FE00 to U+FE0F
following the main character code so as to indicate the glyph alternate?

>But, I've been vocal about not encoding presentation forms more than once
>on this list, so I guess I need to swallow the same medicine I dish out.

Well, if you are referring to ligatures such as ct and the like, the
situation here is surely different. For example, suppose that, in relation
to the stroked characters, someone is wishing to write a scholarly text
explaining the various encoding characters which have been used, then the
use of different encodings for the glyph variations would seem reasonable.

It seems to me that this discussion however does have the same form as the
discussions about precomposed ligature characters such as ct and the like in
the following respect.

It seems to be regarded as that there are two possible solutions.

1. Define regular Unicode or Private Use Area code point allocations for the
presentation forms and produce a display now, though losing such features as
spellchecking and sorting.

2. Stick closely to a "tools and gadgets" encoding which is well engineered
and elegant, preserving such features as spellchecking and sorting, yet
either unusable on any available platform or, for some features, only usable
on "the very latest equipment".

Upon considering the matter I am wondering whether there should become
mainstream a third approach whereby documents are encoded in the "tools and
gadgets" manner, and used in that format for spellchecking and sorting,
together with software facilities, which can run on all platforms, which
produce temporary documents "on the fly" for display using such fonts as
ordinary TrueType fonts where the necessary presentation forms are encoded
so as to be accessible from individual code points, such as code points in
the Private Use Area.

Certainly, the fonts could be advanced format fonts which have the glyphs
accessible by two routes, a direct "tools and gadgets" sequence of code
points and also a direct code point access route.

I feel that if software packages were developed, whether as stand alone
packages or as plug-ins, which could have a Unicode file with "tools and
gadgets" encoded content as input and produce as output a Unicode file with
"direct code point" encoding, then maybe the elegant "tools and gadgets"
encodings could become more widely used sooner than would be otherwise the
case and that the desired displays could be used on other than the very
latest equipment.

I feel that this third approach might well be extremely useful.

I am aware that this approach does need tables of code points in the Private
Use Area to be widely used.

I am fully aware that the Unicode Consortium is unable by its rules to
provide any code point allocation tables for Private Use Area usage in this
or any other manner and that, even if it were able might well not wish to do
so. However, I am also aware that publication of such tables by end users
is permissible.

Now, I am aware, from my own publication of code point allocations in the
Private Use Area for ligatures, that not everyone approves of such
publication of code point allocations in the Private Use Area where those
code points are being suggested for use in interchanged documents. A
variety of views exists.

However, what I am suggesting here is not the same qualitatively. Here I am
suggesting for this "third approach" that documents be interchanged in a
regular Unicode "tools and gadgets" encoding and that the documents produced
using Private Use Area code points are only local temporary documents
produced and used locally for presentation, then discarded, yet that the
Private Use Area code point allocations are published so that there can be
portability and interworking between software and plug-ins and fonts from a
variety of suppliers, be the suppliers commercial organizations, academic
organizations or individuals.

The idea is that eventually such Private Use Area code tables would be used
by fewer and fewer end user systems as the availability of software and
platforms which will directly decode and display all "tools and gadgets"
Unicode encodings increases. However, eventually might be a very long time
in view of the number of older computers deployed around the world.

For the avoidance of doubt I am not suggesting that the Unicode Consortium
itself should produce those Private Use Area code tables as the necessary
software packages and the code point tables need to be produced by end users
of the Unicode system.

An internet browser using this third approach would be a good project as
would a word processor. Could existing products be easily adapted or would
new products be needed?

A good benchmark test might be to send c ZWJ t in a document and, using
U+E707 to access a precomposed ct ligature glyph, display the ligature on
the screen of a computer which cannot use an advanced format font by means
of the receiving software automatically producing a temporary local document
wherein the U+E707 code is used instead of the c ZWJ t sequence of the
transmitted document.

How difficult is that benchmark to achieve please? Is it a major software
development or could it be written into a macro by a knowledgeable person
within a few hours?

William Overington

6 September 2002

This archive was generated by hypermail 2.1.2 : Fri Sep 06 2002 - 03:14:24 EDT