Re: Backslash n [OT] was Line Separator and Paragraph Separator

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Oct 21 2003 - 12:33:07 CST


From: Jill Ramonsky

> I would be more than grateful if someone could point me
> in the direction of a DEFINITVE specification which claims
> this is not the case, that the interpretion of "\n" as
> anything other than LF may be considered conformant
> behaviour.

If you had programmed for MacOS, you may know that
C compilers for that platform generate U+000A=LF for '\r'
and U+0000D=CR for '\n'. This is conforming to the common
use of CR as the standard line separator in text files for
MacOS.

With MacOSX, I think this has changed, as well as this was
the encoding used in the very limited SimpleText program
which limited the filesizes and did not support Unicode, but
only the MacOS native system character set and encoding.

Now this has changed since quite long: C compiler simply
don't care that source lines be terminated by CR or LF or a
combination of them. Also consoles are now common and
tools for MacOS are ported from other environments.

So this legacy encoding of end-of-lines is now quite obsolete
even on MacOS.

However in IBM MVS systems, that are EBCDIC based,
end-of-lines are encoded by a NL character, not LF and not
even CR. On them, the C/C++ language '\n' is mapped to NL,
which is the normal character used in console applications to
display end of lines.

In Java however, the mapping of '\n' and '\r' constants is NOT
bound to the underlying system, but permanently assigned to
LF and CR respectively. It's up to the console emulation layer
to adapt and display end-of-lines on the console from an input
'\n' (LF) at run-time. It's up to the File class to transcode the
'\n' constant to a physical end-of-line in actual text files.

In fact this also occurs in C/C++ in CPM/DOS/Windows systems
where an internal LF gets converted to a sequence CR+LF in the
FILE* interface of <stdio.h> for files opened in text mode.

So when discussing here about characters, don't use '\n' or '\r'
when you mean in fact LF=U+000A or CR=U+000D, unless you're
using a language like Java that maps these programming
constants to actual run-time characters.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST