Re: Backslash n [OT] was Line Separator and Paragraph Separator

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Oct 22 2003 - 05:49:49 CST


From: "John Cowan" <cowan@mercury.ccil.org>

> Kent Karlsson scripsit:
>
> > All of CR, LF, <CR, LF>, NEL, LS, PS, and EOF(!). (Assuming that the
> > encoding of the text file is recognised.)
>
> XML 1.0 treats CR, LF, and <CR, LF> as line terminators and reports
> them as LF.
> XML 1.1 will treat CR, LF, NEL, <CR, LF>, <CR, NEL>, and LS as line
> terminators and report them all as LF. PS is left alone, because of
> the bare possibility that it is being used as quasi-markup.
> [...]

I also have some old documents that use <VT>=U+000B instead of
LF=U+000A to increase the interparagraph spacing. This is still
mapped to the source '\v' character constant in C/C++ (and Java
as well, except that Java _requires_ that '\v' be mapped only to
VT.

Some applications still seem to use <VT> after <CR> to create soft line
breaks, in text files where paragraphs are normally ended by <CR><LF>.

CR was intended to create an overstrike on the previously written (but
still complete) line, for example to underline some characters on that
line. This is what '\r' should imply in C, and in fact such '\r' should no
more be used in C, as it relies to add visual attributes to the previous
text. That why <CR> comes before <LF> that terminates the paragraph.

Of course there will still be a lot more usages in terminal emulation
protocols, which technically are not a text file encodings, as they can
create dynamic effects, or can encode and render a text in a non logical
order, for example when emulating blinking, or creating "ASCII arts":
I consider that terminal emulation protocols (including printing protocols)
are supersets of the plain text format, but plain texts should not attempt
to reproduce all the terminal "features".

So what is the status of <VT> in plain text files ? For me it should
have the same behavior as <LF>, except that it does not imply a end
of paragraph. Is there a good replacement for this legacy control, that
just means a explicit soft line break in the middle of a paragraph (in
which case it may occur instead of a <SPACE> and act as a word
separator, except if it occurs after a <soft hyphen> where it
becomes ignorable) ?



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST