Re: CRLF vs. LF (was Re: Unicode and end users)

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Feb 21 2002 - 23:53:03 EST


Markus Scherer <markus.scherer@jtcsv.com> wrote:

> I think there is no doubt very high interest in editors - especially
system
> default editors like notepad - that can both
> - read plain text using any style line breaks (see Unicode TR)
> - write plain text at least in LF or CRLF if not all the others too
> (CR, NL, LS, PS)

SC UniPad can read and write text files:
- using LF, CR, CRLF, or LS (U+2028);
- in UTF-8, UTF-16 (LE or BE), or UTF-32 (LE or BE), among many others;
- with or without a BOM.

One thing it cannot do is maintain different line separators in a single
file. It converts them all internally to U+2028 and writes them out
consistently according to user preference. (I don't know why one would
want different line separators in a single file, but maybe someone can
think of a reason.)

Another thing it cannot do is use PS (U+2029) as a line separator, or for
anything else for that matter. The 0.96 help file says, "This will change
in a future version." Not many files use PS, but the Chinese "Five Books"
from the Unicode web site do:

    http://www.unicode.org/Public/TEXT/FIVEBOOKS/

Marcus, when you say "NL" do you mean U+0085? What text files use this
convention?

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Thu Feb 21 2002 - 23:37:12 EST