The existing rules for U+FFF9 through to U+FFFC. (spins from Re: Furigana)

From: William Overington
Date: Thu Aug 15 2002

John Cowan wrote as follows.

>In essence, though not formally, U+FFF9..U+FFFC are non-characters as
>well, and the Unicode "semantics" just tells what programs *may* find them
>useful for. Unicode 4.0 editors: it might be a good idea to emphasize
>the close relationship of this small repertoire with the non-characters.

That is not what the specification says. Something can only be emphasised
if it is true in the first place! If it is desired to make U+FFF9 through
to U+FFFC noncharacters then that needs to be done explicitly with a fair
opportunity for people to object and make representations before a decision
is made.

A saying of my own is as follows.

When goalposts are moved, aromatic herbs should be scattered around.

It seems to me, not having known about annotation characters previously,
yet, due to this thread now having read the published rules in Chapter 13
that these are not noncharacters.

It appears to me that the use of the annotation characters in document
interchange is never forbidden and is strongly discouraged only where there
is no prior agreement between the sender and the receiver, and that that
strong discouragement is because the content may be misinterpreted
otherwise. So, if there is a prior agreement, then there is no problem
about using them in interchanged documents.

There appears to be nothing that suggests that U+FFFC cannot be used in an
interchanged document.

I know little about Bliss symbols, though I have seen a few of them and have
read a brief introduction to them, yet it seems to me that annotating Bliss
symbols with English or Swedish is entirely within the specification
absolutely and would be no more than strongly discouraged even if there is
no prior agreement between the sender and the receiver.

Further, it seems to me from the published rules that these annotation
characters could possibly be used to provide a footnote annotation facility
within a plain text file, so that, if a plain text file is being printed out
in book format, then a footnote about a word or phrase could be encoded
using this technique so that the rendering software could place the footnote
on the same page as the word or phrase which is being annotated, regardless
of whether that word or phrase occurs near the start, middle or end of that
page. It seems to me that the statement of the meaning of U+FFFA means that
Figure 13-3 of the specification are just examples, though as the word exact
is used, perhaps they are guiding examples and the use in footnotes is
perhaps stretching the variation from the examples in the diagram.

An interesting point for consideration is as to whether the following
sequence is permitted in interchanged documents.

U+FFF9 U+FFFC U+FFFA Temperature variation with time. U+FFFB

That is, the annotated text is an object replacement character and the
annotation is a caption for a graphic.

It seems to me that if that is indeed permissible that it could potentially
be a useful facility.

On balance, it seems to me that if both sender and receiver are clear as to
what is meant, then the use of annotation characters for Bliss symbols and
for footnotes and for captions for illustrations harms no one, for a "person
skilled in the art" seeking to use the file without knowledge of the
interpretation agreement which should ideally exist between sender and
receiver and who has only the Unicode specification to go on would probably
be unlikely to get a wrong interpretation of the intended meaning, even if
the actual graphical layout were imprecise, as the Unicode standard locks
together the two parts of the annotation sequence and shows that one of the
parts is the annotation for the other part.

William Overington

15 August 2002


