Unicode Plain Text

From: clarkcb@corp.sykes.com
Date: Tue May 20 1997 - 21:13:54 EDT

Next message: Michael Everson: "Romanian terminology"
Previous message: Kenneth Whistler: "Re: Unicode plain text (Was: Line Separator Character)"
Next in thread: Tony Harminc: "Re: Unicode Plain Text"
Maybe reply: Tony Harminc: "Re: Unicode Plain Text"
Maybe reply: Murray Sargent: "RE: Unicode Plain Text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I'm a little confused by this recent thread. I get the feeling that some
people think Unicode needs additional features to be useable, whereas I
think that the necessary features need to be present in Unicode-supporting
applications and fonts. Maybe I'm misunderstanding, but I'll continue
anyway.

I think maybe the problem is that the definition of "plain text" needs some
refining with respect to Unicode. To me, a Unicode plain text file would
contain ANY Unicode character. It would be the writer's responsibility
(together with an input editor, perhaps) to make sure the file contained the
minimum necessary information to render correctly, eg. proper placement of
directional indicators, etc., and it would in turn be the application's
responsibility to render the file in a readable fashion, given the
information contained in the file. Keep in mind that even 7-bit ASCII text
still must be "rendered" by an editor on the screen. Also, keep in mind
that, according to the Unicode Standard, compliance does not necessarily
mean full support. An application might not have bidirectional rendering
capabilities, but that does not mean that a Unicode file with a mixture or
English and Hebrew/Arabic with directional indicators is not a plain text
file.

What makes a plain text file different from any other electronic document,
in my opinion, is the lack vs. the presence of "style" information, such as
font, font size, margins, etc., and additionally, in the case of SGML
instances, procedural markup.

As for usage standards, such as CRLF vs. CR vs. LF vs. LS vs. PS, etc., we
have two options:
1. agree on definitive standards now, and support nothing but, or
2. support everything
Now, I have done enough programming to know that supporting more means more
headaches, but I still feel that the second option is the better one at this
time. Feedback?

Cary

Next message: Michael Everson: "Romanian terminology"
Previous message: Kenneth Whistler: "Re: Unicode plain text (Was: Line Separator Character)"
Next in thread: Tony Harminc: "Re: Unicode Plain Text"
Maybe reply: Tony Harminc: "Re: Unicode Plain Text"
Maybe reply: Murray Sargent: "RE: Unicode Plain Text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT