Re: Should furigana be considered part of "plain text"?

From: Doug Ewell (
Date: Sun Jul 02 2000 - 13:22:41 EDT

11-Digit Boy <> wrote:

> ---- John Hudson <> wrote:
>> Note that this is a text tagging issue, not a Unicode issue, unless
>> you feel that there is some need to indicate Ruby/Furigana in plain
>> text. At some point, plain text ceases to be plain if you decide
>> that layout information needs to be encoded.
> Anybody willing to comment on this???

In Section 13.6 of the Unicode Standard Vrsion 3.0 (pages 325-326), the
use of the Interlinear Annotation characters (U+FFF9 through U+FFFB)
is described. Figure 13-3 shows a very clear example of how furigana
would be encoded using these characters.

To me this indicates that the UTC considered furigana to be a Unicode
issue, not necessarily only a text tagging issue.

On the other hand, the standard does go on to specify: "Usage of the
annotation characters in plain text interchange is strongly discouraged
without prior agreement between the sender and the receiver because the
content may be misinterpreted otherwise." The three IA characters and
the annotating text should be stripped out unless the sender knows that
the receiver will be able to handle them. So it is a complex issue.

The problem with the phrase "plain text ceases to be plain if you decide
that layout information needs to be encoded" is the word "layout." In
the broadest sense, line and paragraph separation could be considered
"layout," and nobody would suggest doing away with the plain-text
characters needed to control those functions.

-Doug Ewell
 Fullerton, California

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT