Re: Backslash n [OT] was Line Separator and Paragraph Separator

From: jon@hackcraft.net
Date: Wed Oct 22 2003 - 08:40:50 CST

Next message: Jill Ramonsky: "RE: Encoding for Fun (was Line Separator)"
Previous message: Peter Kirk: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
In reply to: Philippe Verdy: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
Next in thread: Philippe Verdy: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
Reply: Philippe Verdy: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> > > So this legacy encoding of end-of-lines is now quite obsolete
> > > even on MacOS.
> >
> > I don't think it can be called "obsolete" as long as files generated using
> > that line end convention exist. Or, at least, applications that have an
> > operation for "read a line" will have to cope with it. (In other words,
> > all of the CR LF CRLF LFCR should mark an "end of line".)
>
> I was not speaking about the actual encoding of files into bytes, but
> only about the interpretation of '\n' or '\r' in C/C++, which was the real
> subject of the message.

ISO 14882 says that \n is LF (and also it is newline, i.e. LF is the newline
function as far as C++ is concerned) and \r is CR.

It does not define this relative to any given character set. So there is
nothing in the standard to prevent char being interpreted as an implementation-
defined character encoding which is identical to, say US-ASCII or a part of ISO
8859, except for having CR encoded as 0x0A and LF encoded as 0x0D. This would
simplify converting newline functions when writing text files on Macs, but
potentially cause problems elsewhere.

However because the universal-character-name escapes (\uXXXX and \UXXXXXXXX)
are defined relative to a particular encoding, namely ISO 10646, it would be an
error if ('\n' != '\u000A' || '\r' != '\u000D'). Whether this is implemented by
using the values 0x0A and 0x0D for LF and CR respectivley (e.g. by using US-
ASCII or a proper superset of US-ASCII such as Unicode) or by converting those
values to another encoding when parsing isn't specified.

Given that C and C++ are intended to be neutral to encodings, and indeed they
do not even mandate that a char be an octet, or that a wchar_t be of the same
size as 2 or 4 chars, this is not surprising. The consequence is that we cannot
assume that conversion of character, wide character, and string literals to and
from Unicode will be trivial.

Next message: Jill Ramonsky: "RE: Encoding for Fun (was Line Separator)"
Previous message: Peter Kirk: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
In reply to: Philippe Verdy: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
Next in thread: Philippe Verdy: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
Reply: Philippe Verdy: "Re: Backslash n [OT] was Line Separator and Paragraph Separator"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST