Courtyard Codes and the Private Use Area (derives from Re: Encoding of symbols and a "lock"/"unlock" pre-proposal)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Fri May 24 2002 - 07:02:22 EDT


Peter Constable included the following in his post.

>As for
>PUA, many people have their own plans regarding U+F300..U+F3FF. For my own
>part, my plans for U+F300..U+F3FF almost certainly do not involve padlock
>symbols.

Thank you for your email.

As is well known, the Unicode Consortium will not endorse any code point
allocations in the Private Use Area and everyone has the right to allocate
none, some or all code points in the Private Use Area as he or she chooses,
and to publish them if he or she so chooses.

This is an interesting situation. If one views the situation from the
inside looking out, then it becomes impossible for there to be any certainty
as to what is the intended meaning of a code point from the Private Use Area
which is used in a Unicode plain text file on the basis of examining the
code points.

However, if one views the situation from the outside looking in, a somewhat
different situation arises.

Suppose that I define a .eut file format to be structurally a Unicode plain
text file with the added feature that all code points that are within the
Unicode Private Use Area are defined to have the meanings which I give them
in my eutocode set of code point allocations.

So, a .eut file could be a rigorously defined file format, just as is .bmp
or .png. If a wordprocessing package were to have a selection option for
reading in files of a .eut format, then there would be no confusion
whatsoever about the meaning of, say, a U+E707 character: it would be a ct
ligature.

Now, suppose I define a .uto file format to be structurally a Unicode plain
text file with the added feature that all code points that are within the
U+F3.. block of the Private Use Area have the meanings of a set of codes
called Courtyard Codes, and all other code points that are within the
Private Use Area have an undefined meaning, unless a sequence of some of the
Courtyard Codes has indicated from which type tray all subsequent Private
Use Area codes which are not in the U+F3.. block are to be regarded as
coming.

A wordprocessing package could be programmed by its manufacturer to accept
input in .uto file format, with accuracy of meaning for every code point
used in the file, even if some Private Use Area code points were used to
have two different meanings in two parts of the same document.

----

I like to imagine an analogy of the way that Unicode code points can be defined as if there is a large kitchen table which is plane 0. Onto most parts of the table, pieces of coloured paper are laid, always taking care that no piece of paper overlaps any other piece of paper, so that the table surface is only covered by one thickness of paper. On an area about one tenth of the total area of the table is an area called the Private Use Area, and here paper can be piled. Perhaps 500 sheets of paper could be piled upon this area. So, if someone says, of some particular place on the surface of the table "What colour is the paper?" then for parts of the table that are not in the Private Use Area, the colour of the paper can be stated. However, for the Private Use Area, the colour of the paper cannot be stated with certainty. It depends upon which piece of paper is being viewed at any one time. Suppose, however, that the people who are placing the paper onto the Private Use Area agree amongst themselves that they like the look of that nice yellow square of paper that takes up a small part of the Private Use Area and will voluntarily avoid placing any paper on top of it. One would then end up with a Private Use Area that has coloured paper piled up all over it, except for in one small area where there is a yellow square. The net effect would be that the area covered by the yellow square would be as uniquely defined as to the colour of paper upon it as anywhere not in the Private Use Area.

Now, the question that naturally arises is as follows. Will all end users agree to keep the U+F3.. area only for the Courtyard Codes? Who knows? I suggest however that it is possible that they will, because I hope that, when they consider the matter, that people will feel that it is to their own advantage to do so.

I feel that if everybody who wishes to make definitions into the Private Use Area learned of the existence of the Courtyard Codes and finds that the features that it could provide for them are extremely useful and may, in time, become built into widely used software packages, then they might well do so.

What would this take?

Ease of use. Where a wordprocessing package or a desktop publishing package or whatever has an option for reading in a Unicode plain text file it would also have an option for reading in a .uto file. The Courtyard Codes would need to be well defined, publicly available, free to use and free of legal entanglements. Please note that I have chosen the name Courtyard Codes for the system as Courtyard and Codes are two English words, not words specially coined. I got the idea of using the word Courtyard from the notion of a courtyard garden. If people like to think of the imagery of a courtyard garden with various items within it which is a nice place to be in, then fine.

Potential benefit. The Courtyard Codes provide facilities which may be useful for fairly simple widely available software.

Here are the codes which I have defined so far, except for the classification codes and the padlock codes which I have previously published. Hopefully other codes will be added gradually. However, I feel that as this topic is current I shall take the opportunity to post those codes that I have already defined in the hope that end users of the Unicode system around the world may become aware of them, have a look at them and hopefully feel that avoiding making any definition in the U+F3.. block of the Private Use Area would be to their advantage, so that they keep open the possibility of using Courtyard Codes in conjunction with their own use of the Private Use Area.

Please know, for the avoidance of doubt, that although I am carrying out my research for the eutocode system and am elsewhere defining uses of codes within the Private Use Area for specific purposes, including graphics, ligatures such as ct and long s ligatures, mouse events and push button pushes on a hand held infra-red control device of a multimedia television and embedding 1456 object code into a Unicode plain text file, I am not asking that those codes are not overlapped. So, although I use U+E707 to mean a ct ligature within eutocode, I am fully expecting and am entirely happy that other people define U+E707 to mean something else. I am simply asking end users for the U+F3.. block not to be overlapped please if that is possible, as by end users keeping the U+F3.. block to have one meaning, then all end users can use the features provided by the Courtyard Codes in conjunction with any character sets that they design in the Private Use Area, with the hope that software packages will in the future understand those codes and all uses of the Private Use Area can be classified using the classification codes. I feel that if end users choose to have this way of using the U+F3.. block widely accepted amongst themselves, then that will be to everybody's advantage.

Readers might like to know that eutocode is being designed primarily for use in applications involving the broadcasting of digital multimedia on digital television channels. The DVB-MHP (Digital Video Broadcasting - Multimedia Home Platform) system, details at the http://www.mhp.org website, uses Java for broadcasting software and Java uses Unicode, and those Java programs could be written so as to accept .eut and .uto file formats as data, so my initiative in asking end users to agree to trying to have the classification system and basic formatting codes will hopefully have far reaching implications for the use of Private Use Area codes with software that recognizes these formatting codes.

Please note that the formatting codes are not detailed as to width of table cells or specific fount and so on. The idea is as if someone has gone to a print shop and generally explained to the printer what he or she is looking to achieve with the layout. The person at the print shop then does his or her best with what he or she has available. Courtyard codes are not intended to be a full markup system, they are intended to be a fairly basic system that is highly portable, yet which does have scope to provide layout effects which are both practical and indeed potentially quite stylish.

Courtyard Codes could also be used with just regular Unicode, with no Private Use Area codes except for the Courtyard Codes themselves. This usage would allow stylish layout to be achieved using an almost plain text file.

----

U+F3A2 PLEASE LIGATE THE NEXT TWO CHARACTERS U+F3A3 PLEASE LIGATE THE NEXT THREE CHARACTERS U+F3A4 PLEASE LIGATE THE NEXT FOUR CHARACTERS

U+F3A8 PLEASE SWASH THE NEXT PRINTABLE ITEM U+F3A9 PLEASE ALTERNATIVE SWASH THE NEXT PRINTABLE ITEM

In the event of requesting a swash version of a ligature where the ligature is requested using U+F3A2 or U+F3A3 or U+F3A4, the swash request code precedes the ligature request code. . Swash versions of any character can be requested, not just ligatures: indeed a swash request for a ligature is thought to be a potentially rare occurrence.

If a U+F3A8 is obeyed and there is no swash character available, then the ordinary version of the letter is displayed.

The U+F3A9 code point is because some founts may have two swash versions of a particular letter. The U+F3A9 code allows access to the second swash version of a particular letter. If an alternative swash character cannot be displayed, the U+F3A9 code acts as if it were a U+F3A8 code.

----

Here are eight code points for signalling italic and bold. The idea is that the processor will have two Boolean variables, ITALIC and BOLD, which are by default false.

U+F3C0 PLAIN - ITALIC:=false; BOLD:=false; U+F3C1 ITALIC - ITALIC:=true; BOLD:=false; U+F3C2 BOLD - ITALIC:=false; BOLD:=true; U+F3C3 BOLD ITALIC - ITALIC:=true; BOLD:=true; U+F3C4 REMOVE ITALIC - ITALIC=false; U+F3C5 ADD ITALIC - ITALIC=true; U+F3C6 REMOVE BOLD - BOLD=false; U+F3C7 ADD BOLD - BOLD=true;

Here are some codes for type face choice.

U+F3C8 PLEASE USE DEFAULT FACE U+F3C9 PLEASE USE SERIFED FACE U+F3CA PLEASE USE SANSERIF FACE U+F3CB PLEASE USE ORNATE FACE U+F3CC PLEASE USE FORMAL SCRIPT FACE U+F3CD PLEASE USE INFORMAL SCRIPT FACE U+F3CE PLEASE USE MONOSPACED FACE

Here are some codes for formatting.

U+F3D0 LEFT ALIGN U+F3D1 RIGHT ALIGN U+F3D2 CENTRE U+F3D3 JUSTIFY U+F3D4 SINGLE COLUMN U+F3D5 DOUBLE COLUMN FOR THE REST OF THIS PAGE

U+F3D8 TABLE START U+F3D9 TABLE END U+F3DA START THE NEXT TABLE ROW U+F3DB START THE NEXT TABLE COLUMN IN THE PRESENT ROW

Here are some codes for text colour.

U+F3E0 BLACK U+F3E1 BROWN U+F3E2 RED U+F3E3 ORANGE U+F3E4 YELLOW U+F3E5 GREEN U+F3E6 BLUE U+F3E7 MAGENTA U+F3E8 GREY U+F3E9 WHITE U+F3EA CYAN U+F3EB PINK U+F3EC DARK GREY U+F3ED LIGHT GREY U+F3EE LAVENDER U+F3EF MINT

Here are some codes for type sizes.

U+F3F0 DEFAULT SIZE U+F3F1 6 POINT U+F3F2 8 POINT U+F3F3 10 POINT U+F3F4 12 POINT U+F3F5 14 POINT U+F3F6 18 POINT U+F3F7 24 POINT U+F3F8 30 POINT U+F3F9 36 POINT U+F3FA 48 POINT U+F3FB 60 POINT U+F3FC 72 POINT U+F3FD 96 POINT U+F3FE 144 POINT U+F3FF 192 POINT

I hope that these Courtyard Codes will be of interest to end users. I am hoping to add some more codes gradually and then hopefully add a full document to the web at http://www.users.globalnet.co.uk/~ngo which is our family webspace in England.

William Overington

24 May 2002



This archive was generated by hypermail 2.1.2 : Fri May 24 2002 - 05:33:23 EDT