Re: The golden ligatures collection ct ligature code in use.

From: William Overington (
Date: Tue Jun 04 2002 - 09:38:55 EDT

>> I then formatted the text in PowerPoint to 200 points, italic and green.
>> So, it appears that SC UniPad used in conjunction with Word and
PowerPoint can be used to prepare elegant presentations in the languages of
the world. Wow!

SC UniPad provides excellent inputting facilities for Unicode code points,
making available a selection of virtual keyboards for various languages and
scripts. However, it does not seek to have a display other than one fount
in one small size in one colour, black. PowerPoint however does have
display capabilities in many founts, many sizes and many colours, yet is
quite tedious when it comes to entering text which uses accented characters.
However, there were problems over using a copy and paste from SC UniPad to
PowerPoint. So I tried a copy and paste from SC UniPad to Word and then
carried out a copy and paste from Word to PowerPoint. The display was still
small and was still black and the fount, though different, was not that
different. So, to complete my experiment I formatted the text which was now
in PowerPoint using a 200 point size, setting italics and setting the colour
as green, thus satisfying myself completely that the complete process from
keying in the text in SC UniPad to viewing a PowerPoint presentation was
possible. Having done that I was then confident that, as long as a correct
TrueType fount for the required script is supplied for the PowerPoint
program to use, that elegantly set out PowerPoint presentations in various
scripts van be produced straightforwardly by keying in the text in SC UniPad
using a virtual keyboard, then copy and pasting it to Word, then copying and
pasting it from Word to PowerPoint, then formatting the text for size and
colour of lettering using the facilities of PowerPoint and producing a
PowerPoint display. The finished product would be a PowerPoint file, with
the use of SC UniPad in its preparation not apparent to the end user of the
PowerPoint presentation.



The idea is as follows.

Firstly, the background. In the light of The Respectfully Experiment, in the way that Mr James Kass utilised the golden ligatures collection code for a ct ligature, U+E707, to designate, within a fount which he himself authors, the glyph for a ct ligature which is normally accessed indirectly using a sequence of characters, thereby also allowing direct access to the glyph as a U+E707 character, I feel that there is scope for both indirect and direct access to coexist using the same founts, with advantages for both methods, not being conflicting methods of using ligatures yet being complementary methods of using ligatures. For example, for work using sorting, indexing a book, authoring a dictionary and so on one would ideally use a c ZWJ t approach whereas for situations where someone does not have the more modern facilities available, or is just setting, say, one page of text in a black letter face so as to produce a page of text suitable for printing out and framing as a picture, using the golden ligatures collection codes would not be unacceptable. Thus founts could be fully modern, yet also have a standardized way of assigning code points to the glyphs used for ligatures. Indeed, this approach would also be helpful for people with older equipment, as ligature characters could be entered using whole code points and then a standard software utility could be used to convert the resulting file to a format where all of the ligatures were broken down into the indirect format using ZWJ characters. It seems to me to be a very beneficial solution all round. If a fount designer needed a very special ligature not in the standard set of regular Unicode, then he or she could still resort to using the Private Use Area.

Ideally these code points would be part of regular Unicode. I am aware that current policy is not to add any more ligature codes to regular Unicode, yet, in view of this new approach of using code points for whole ligatures in conjunction with the ZWJ method, then maybe the matter might reach the agenda again and, in the light of this new scientific evidence, the matter be reconsidered. If the matter were reconsidered in this manner, then perhaps a number of ligature characters, using some or all of those in the golden ligatures collection, together with any others that the committee thought it desirable to include, such as those used for calligraphy, might be added into the U+FB.. block. This would then allow fount designers to standardize on official Unicode and ISO codes, producing rigorous founts, with this extra facility, for the future.

I feel that it is idle to speculate as to whether the committees will actually consider this matter, or as to what they will or will not agree, or as to the likelihood of whether they will agree and so on. The important matter is what actually happens. The fact of the matter is that, in the light of the golden ligatures collection list having been published and Mr James Kass using the code for a ct ligature from the list in conjunction with an OpenType fount, there is new scientific evidence available now which was not available at the time when the decision not to encode any further ligature characters was made. Thus the decision that led to the present policy was based on evidence available at the time and not on the present evidence. I feel that it is important to specifically note that, after the event of this new scientific evidence becoming available, that the Unicode Consortium has not, as far as I am aware, made any statement as to whether it will or will not consider again the matter of ligatures: not that I would myself expect the Unicode Consortium to make any such statement, my expectation is that, if the Unicode Consortium at some future time receives a formal proposal then it will consider any such proposal at that time in the light of the scientific evidence available at that time.

Suppose please for this next section that a large collection of such ligatures has been encoded in the U+FB.. block. In the event of someone posting a document to the Unicode list and including a ligature character in the posting, suppose that the software system producing the archiving automatically converts any U+FB.. character into a sequence of single letters with ZWJ characters between them and stores them in the archive in that format. Any end user accessing the archive, perhaps using older equipment, could request that documents viewed in the archive or saved from it are not presented in the normal "ZWJ format being used for ligatures" way, yet in a "U+FB.. codes used for ligatures" way. This would be a quite straightforward option for the software system to offer to end users, that is, ZWJ as the default, U+FB.. block code by special request.

Now, in relation to having a WATERMARK-LIKE MEMORY THAT A WHOLE LIGATURE WAS ORIGINALLY USED FOR THE FOLLOWING LIGATURE code. That code would be a regular Unicode code and would display as zero width and would be ignored as regards significance in sorting and collating and so on. My reasoning for suggesting such a code is that if an archive is taking in ligatures expressed in ZWJ format and storing them directly and is also taking in ligatures expressed in U+FB.. format, converting them to ZWJ format and then storing them, it could possibly be the case that the owner of the database might like to keep a record of whether the ligature arrived in one form or the other. Now, it might be that the owner of the database would not care how the original coding was made, but he or she might! So, in order to provide for the possibility that the owner of such a database did wish to preserve a record that the original document used a whole ligature code rather than a ZWJ sequence, I suggested the WATERMARK-LIKE MEMORY THAT A WHOLE LIGATURE WAS ORIGINALLY USED FOR THE FOLLOWING LIGATURE code. If that code is ever implemented in regular Unicode it will probably have a different, shorter, name. Yet for this discussion and for experiments, where experimental software needs to have clearly commented source code, such a name for the code point is not unreasonable.

So, suppose that someone posts a message to the Unicode list containing the word astrolabe including a ligature for the st. Please note that the st ligature is U+FB06. For the purpose of this discussion let us please use WLMTAWL to stand for the WATERMARK-LIKE MEMORY THAT A WHOLE LIGATURE WAS ORIGINALLY USED FOR THE FOLLOWING LIGATURE code point value.

My thinking is that if the word astrolabe arrived as asZWJtrolabe then it is stored as asZWJtrolabe in the archive, yet if it arrived as aU+FB06rolabe then it is stored as aWLMTAWLsZWJtrolabe in the archive. Thus either method of sending the st ligature can be used, both methods result in the archive storing alphabetically sortable text and in addition the fact that a whole ligature character was used in the original document is recorded in the archive.

The archive files could, if it were so desired, be searched by a specially written program by the database manager so as to find out the answer to such a question as the following.

For all of the ligature codes used in postings to the Unicode list, how many were sent using ZWJ codes and how many were sent using U+FB.. codes?

In order to find the answer to this question the software would simply look for ZWJ occurrences and determine whether or not a WLMTAWL code was present immediately preceding the first character of the ligature sequence.

So, my idea for a WATERMARK-LIKE MEMORY THAT A WHOLE LIGATURE WAS ORIGINALLY USED FOR THE FOLLOWING LIGATURE code is basically quite straightforward and could be easily used to good advantage. However, its use would not be obligatory, so that if, say, a database manager has no interest in whether the original of a document used a ZWJ sequence or a U+FB.. code for a ligature, then the WATERMARK-LIKE MEMORY THAT A WHOLE LIGATURE WAS ORIGINALLY USED FOR THE FOLLOWING LIGATURE code need not be used at all in that particular database application.

Naturally, it would be best if such a code were part of regular Unicode and, at some future time, if more ligatures are encoded in regular Unicode then maybe it would be added as part of the same process as the adding of the ligatures, yet, thinking that perhaps some people might like to try out some programming experiments with the technique now, I suggested a particular code within the Private Use Area in the hope that if various people try out such programming experiments, then hopefully any files produced could be interchanged from experimenter to experimenter as part of the research process: also, suggesting a particular code does provide a stepping stone so that an experimenter has a definite place to start.


The golden ligatures collection documents are available on the web at the following address.

William Overington

4 June 2002

This archive was generated by hypermail 2.1.2 : Tue Jun 04 2002 - 07:58:55 EDT