Re: Backslash n [OT] was Line Separator and Paragraph Separator

From: Jonathan Coxhead (jonathan@doves.demon.co.uk)
Date: Wed Oct 22 2003 - 12:56:28 CST


   On 22 Oct 2003, at 6:53, John Cowan wrote:

> Kent Karlsson scripsit:
>
> > All of CR, LF, <CR, LF>, NEL, LS, PS, and EOF(!). (Assuming that the
> > encoding of the text file is recognised.)
>
> XML 1.0 treats CR, LF, and <CR, LF> as line terminators and reports
> them as LF.
>
> XML 1.1 will treat CR, LF, NEL, <CR, LF>, <CR, NEL>, and LS as line
> terminators and report them all as LF. PS is left alone, because of
> the bare possibility that it is being used as quasi-markup.
>
> I can't imagine why EOF should be called a line terminator, except
> in the sense that a "read a line" operation should obviously not attempt
> to read past EOF. Calling it a line terminator means that every
> document is forced into the mold of being an integral number of lines
> long, regardless of the facts.
>
> > Don't know about <LF, CR>. I think that should be two line ends.
>
> I agree. I don't know any system that uses this sequence.

   The BBC Micro---well-known to a generation of British schoolchildren---used
this sequence. You can probably find files in that encoding on some 5.25in
floppies in DFS format in some store cupboards somewhere (for what that's
worth).

   I wrote a little line-conversion (f)utility recently, and the (minimal)
research I did suggested that the following was a complete set of line-
terminators that might be found in practice:

      CRLF
      CRFF
      CRVT
      LF
      FF
      VT
      CR
      LFCR
      NEL
      CRCRLF
      NUL
      end of file (not control-D or control-Z, I mean the real end-of-file)

   CRLF is derived from standard printer technology. CRFF and CRVT are how you
would get the printer to move by more than a line.

   More recent practice allows LF, FF or VT to be used solo. If sent directly
to a printer they still terminate the line, though the printed output would
look different since the "carriage" would not "return".

   CR is from MacOS, LFCR is from the BBC Micro. NEL is a dedicated character
with the right meaning.

   CRCRLF is generated by some buggy software I have to put up with. And I
can't remember why I wanted to allow NUL. I probably reasoned that (in its C
role as "end of string") it must terminate a line, just as EOF does.

   This is all for Latin-1 only. Obviously, it's pretty idiosyncratic, but it
looks like I missed at least CRNEL---any others?

   I think someone mentioned the IND ("index") character recently in the
context of line-breaking. I'd like to ask, what is its intended function?

        /|
 o o o (_|/
        /|
       (_/



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST