Unicode Plain Text

From: clarkcb@corp.sykes.com
Date: Tue May 20 1997 - 21:13:54 EDT


I'm a little confused by this recent thread. I get the feeling that some
people think Unicode needs additional features to be useable, whereas I
think that the necessary features need to be present in Unicode-supporting
applications and fonts. Maybe I'm misunderstanding, but I'll continue
anyway.

I think maybe the problem is that the definition of "plain text" needs some
refining with respect to Unicode. To me, a Unicode plain text file would
contain ANY Unicode character. It would be the writer's responsibility
(together with an input editor, perhaps) to make sure the file contained the
minimum necessary information to render correctly, eg. proper placement of
directional indicators, etc., and it would in turn be the application's
responsibility to render the file in a readable fashion, given the
information contained in the file. Keep in mind that even 7-bit ASCII text
still must be "rendered" by an editor on the screen. Also, keep in mind
that, according to the Unicode Standard, compliance does not necessarily
mean full support. An application might not have bidirectional rendering
capabilities, but that does not mean that a Unicode file with a mixture or
English and Hebrew/Arabic with directional indicators is not a plain text
file.

What makes a plain text file different from any other electronic document,
in my opinion, is the lack vs. the presence of "style" information, such as
font, font size, margins, etc., and additionally, in the case of SGML
instances, procedural markup.

As for usage standards, such as CRLF vs. CR vs. LF vs. LS vs. PS, etc., we
have two options:
1. agree on definitive standards now, and support nothing but, or
2. support everything
Now, I have done enough programming to know that supporting more means more
headaches, but I still feel that the second option is the better one at this
time. Feedback?

Cary



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT