Re: Furigana

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Aug 13 2002 - 21:58:28 EDT


Tex asked:

> But does the standard address their removal by receivers (or
> intermediaries) , and does removing them include removing the contained
> annotation?

Yes and yes. p. 326:

"On input, a plain text receiver should either preserve all characters
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
or remove the interlinear annotation characters as well as the annotating
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
text..."
^^^^

>
> I can imagine an application that doesn't support I.A. deciding the
> annotation is out of band and can't be preserved in its plain text
> output, and so justifiably strips it as well.
> Does the standard say what to do with "for internal use" only
> characters?

Yes. Unicode 3.1:

D7b: Noncharacter: a code point that is permanently reserved for
     internal use, and that should never be interchanged.

C10: A process shall make no change in a valid coded character
     representation other than the possible replacement of
     character sequences by their canonical-equivalent sequences
     or the deletion of noncharacter code points, if that process
     purports not to modify the interpretation of that coded
     character sequence.

The interlinear annotation characters fall in a gray zone, since
they are not noncharacters, but by rights ought to have been.
Since they are standard characters though, the standard has to
provide some guidelines -- and it is simply safer, if you encounter
and delete them, to also delete the annotation. You would be changing
the interpretation of the text, but in a knowing, intended manner.

>
> I would have thought the rule was to ignore and pass along.

In general, yes, as for everything else, including unassigned
code points. If your role in life is as a database, for example,
or some other kind of data source or data pipe, then minimal
meddling with the bytes is safest. But other kinds of processes
will do graduated manipulations, depending on what they are
aiming for.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Aug 13 2002 - 20:05:02 EDT