Re: Plain Text

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Tue Jul 06 1999 - 20:31:02 EDT


> > So at minimum, a text file should be tagged according to character set.
>
> Whoa! Wait a minute. How do we get from here to there?
>
> If it's tagged, it's not a *plain* text file, but something else.
>
Sorry, I meant externally tagged, e.g. in the directory entry, along
with the size, date, etc. (The lack of this kind of external tagging
is a pet peeve of long duration, but is not exactly relevant to this
discussion.)

> The way ahead out of the character set identity morass for "text files"
> is to use the Universal Character Set -- that way, once again, we
> will know how to interpret plain text files.
>
Agreed! Well... At least if we are successful, and some new consortium
doesn't come along xx years from now and declare Unicode to be "legacy"
and its own new-and-improved universal encoding to be the only one to
use from now on. At which point, we might need to differentiate
"legacy" Unicode data from the new code, just as we now need to
distinguish Unicode from Macintosh Quickdraw, Latin-1, etc. (Saying
there will be only one character set in the future is like saying a
network address can be 8 bits because there will never be more than 256
computers on a network :-)

> The rest of this discussion is about something else other than what
> the Unicode Standard means by "plain text", and has, as far as I can
> tell, more to do with devising a kind of a lowest common denominator
> document format standard for interoperability. While people on this list
> may find that interesting to discuss, it is rather orthogonal to the
> intended scope of the Unicode Standard.
>
If it is, it shouldn't be. If we rely on some other organization to
worry about this (which one has the authority?) and Unicode outlives
the standards and products of that organization, then we're back to "all
bets are off".

On the other hand, if we can back up the statement that Unicode is a
plain-text standard with a definition of plain text that incorporates
"lowest common denominator document format standard for interoperability"
I think we will have added significant value and endurance to Unicode.

The discussion seems to be trailing off -- I suppose I'll wait a few
days to see what else comes up and then attempt to write something up
(with full consideration of TR13).

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT