Re: Private Use proposals

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Wed May 22 2002 - 12:20:45 EDT


Thank you for your reply.

>I do feel I need to comment in regard to your two messages, totaling 19
>KB, which are largely focused on the Private Use Area and quasi-official
>codifications of its usage.

Well, the ideas are not intended to be quasi-official. Just one end user of
the Unicode system seeking to use the Private Use Area to good effect and
putting forward ideas to other end users who might like to consider using
some of the facilities suggested.

>Many of the issues relating to the PUA have been discussed numerous
>times on this list. We all know, as you state in one of your posts,
>that Unicode is committed to leaving the PUA free and available to all
>users, to the point that they will not sanction any "semi-official"
>mappings of characters to the PUA nor any indexing mechanism to
>reference such mappings. I used to wonder why there was no link on the
>Unicode Web site to ConScript, since it seemed to me like a creative use
>of Unicode. Now I understand that such a link might be misinterpreted
>as an official endorsement of ConScript.

Well, I have found that if I don't mention the fact that endorsement of
Private Use Area allocations will not be endorsed then someone will usually
point out the fact as if saying so then refutes my suggestion.

Also, I was replying to someone who may possibly not have known about the
matter.

A pity that the ConScript link cannot be linked from the Unicode website. I
would have thought that mentioning that various people are using the Private
Use Area in various ways and, after having stated the non-endorsement rule,
providing a few links would have been alright: still, its not my website and
maybe there are legal implications of which I am unaware should the Unicode
Consortium provide such a link.

>1. There is *strong* opposition to encoding additional presentation
>forms for alphabetic characters. Ligatures are presentation forms.
>Beginning with version 3.1, Unicode has stated that alphabetic ligatures
>may be formed with the help of U+200D ZERO WIDTH JOINER, or
>automatically by the font without explicit encoding. (As Michael
>Everson pointed out in his "zero-width ligator" papers, the
>automatic-formation approach requires hairy contextual analysis in cases
>such as Fraktur.) But the whole reason for inventing these solutions is
>that additional Latin ligatures are EXTREMELY unlikely to be encoded.

Well, I had a search for Mr Everson's papers and found three of them on his
http://www.evertype.com website. I have had a look through them and hope to
have a further read later. These are n2141.pdf, n2147.pdf and n2317.pdf.
Now, the fact is that Michael suggested a feature named ZERO WIDTH LIGATOR
specifically for the purpose of ligation and it appears that that suggestion
has not been accepted, but that a shared solution with a code point that can
also mean something else has been decided upon. Now, I do not know the
details of all of this and I certainly hope to study the matter more, yet,
as someone who is not a linguist as such but an inventor and programmer, I
have a concern that using one code point for two types of meaning rather
than one code point for each type of meaning is what I call a software
unicorn. The concept of a software unicorn can be read about on
http://www.users.globalnet.co.uk/~ngo/euto0008.htm if anyone is interested.
Certainly, if I were encoding ligation as an operator I feel that I would
perhaps tend to introduce two new code points, namely PLEASE LIGATE THE NEXT
TWO CHARACTERS and PLEASE LIGATE THE NEXT THREE CHARACTERS rather than a
ZERO WIDTH LIGATOR which is only reached after the first character of the
ligature has been reached, yet I would also try to make sure, as Michael has
done, that ligation was carefully separated from other operations.

I feel that perhaps the matter needs to be looked at again and Michael's
suggestion for the ZERO WIDTH LIGATOR reviewed with the possibility that it
become accepted after all. I am wondering whether to add ligation
facilities into the U+F3.. block of my usage of the Private Use Area so as
to provide a safety net in case those software unicorns start galloping, for
as the song goes, a castle of software can fall to the ground, if over its
drawbridge their golden hooves pound!

Yet I do not regard my desire to formally encode some more ligatures into
the U+FB.. block as in any way contradictory to the use of a ZERO WIDTH
LIGATOR code point. There is room for both methods to be available and the
end user can make his or her choice depending upon the application and upon
what facilities are available to him or her. Someone compiling a dictionary
might well use a ligator approach. If someone wishes to set Fraktur or set
in an18th Century English style, using TrueType founts with a wordprocessing
package or a multimedia authoring package, then using ligature characters
might be the best approach. Certainly my background in setting metal type
perhaps influences me to want to have the ligature characters as such, yet
metal type was used for centuries and transcribing of texts onto computers
may well take place. I accept that there can be problems of an ever
expanding set of ligatures, yet the glyph design has got to be available for
the ligature somehow and the "one code point gives one ligature character"
approach is certainly effective with wordprocessing software and multimedia
authoring packages.

As to strong opposition to encoding additional presentation forms for
alphabetic characters, well, we live in a democratic society and if some
people who would like to produce quality printing feel that using a TrueType
fount with some ligature characters does what they want and harms no one
else, what exactly is the objection? Certainly, if it were one method or
the other, then the ligator operator would be best, yet it is not
necessarily one method or the other, there is scope to encode the ligatures
for Fraktur and for 18th Century English books and also have Michael's ZERO
WIDTH LIGATOR. Maybe there needs to be a note about which method is
considered the most appropriate use for major uses, yet there is, I suggest,
scope for both methods to be encoded into regular Unicode. I hope that that
will happen at some time in the future. For the moment, I am trying to have
a discussion about it with a view to producing a list of Private Use Area
code points by the weekend if possible. Then anyone who uses that list,
perhaps to produce a TrueType fount, can at least have a set of code points
to use.

>But don't expect that action to have any bearing on what
>UTC or WG2 does. They want formal proposals, and they have an official
>form. If I decide there is enough support for "lock" and "unlock" to
>warrant a proposal (the grass-roots vote so far is 3 for, 1 against), I
>will fill out the form. Doing a PUA implementation is fine, but has
>nothing to do with formal proposals.

Certainly they want proposals to be formal and on an official form.

As to whether a Private Use Area implementation has nothing to do with
formal proposals is not, I feel, so clear cut. Certainly, I do not expect
the fact that I have suggested four particular code points for various
padlocks in the Private Use Area to influence a formal decision. Yet, by
suggesting those four code points, if, at various organizations various
people are, without making any public announcement, trying out a fount with
two or four padlock symbols in them, then maybe, just maybe, they will use
the code points that I suggested in my posting. If they do, this would then
mean that if they try making test applications that make use of the padlock
symbols expressed as Unicode code points then those test applications may be
interoperable with test applications made by other researchers, which might
be of benefit at some stage in the future, if perhaps various people make
test founts with padlock symbols in them available for trials.

As for the ligatures, I feel that having a list of code points available is
worthwhile, so that anyone who does want to make a fount with ligature
characters in it has a list to use.

I am adding various characters to my list. A gentleman emailed to suggest
fj as a ligature as in the word fjord. Also, in Michael's documents there
are some additional characters, including an ft in Fraktur. It is
interesting. I have certainly learned a lot by following up your mention of
Michael's papers.

>
>Sorry for chewing up so much additional bandwidth.

Well, I enjoyed reading what you wrote. Thank you for replying so fully.

William Overington

22 May 2002



This archive was generated by hypermail 2.1.2 : Wed May 22 2002 - 13:06:52 EDT