Re: Backslash n [OT] was Line Separator and Paragraph Separator

From: Philippe Verdy (
Date: Fri Oct 24 2003 - 07:12:36 CST

From: <>

> > > Still, I stand by saying that \n is defined in C++ as LF and \r as CR,
> > because
> > > that's sitting in front of me in black and white.
> >
> > Yes, true. But that does *not* mean that (int)'\n' can be counted on to
> > be 10
> Of course, given that any of a variety of character encodings could be in
> any guarantee that (int)'\n' == 10 would violate the definition of \n as

What is important in the standard is that the source author must assume that
'\n' will have the desired effect of terminating a line in text files, i.e.
the same
effect produced by LF in a Unix environment. There's no such requirement for
binary files (so this requirement does not apply to files open with the
C library without the "t" flag), and only text files are required to support
conversions if necessary to keep that effect:

- in CP/M, DOS, OS/2, Windows, this is done by the standard library linked
the application, not by the OS.

- in MVS, VMS (and in some cases in NT with its optional support for
foreign filesystems), this may be done by the OS itself.

- on Mac Classic, this is done by the compiler itself, which binds \n to the
function (as defined by the language standard), where this LF is mapped to
in the Macintosh character set.

In any of these cases, the test "if ('\n' == 10)" will not necessarily be
true even
if the compiler is conforming to the C99 or ISO C++ standard: this is in the
area where characters are promoted to integers, and where the C/C++
are not very clear as they use simple integer promotion rules to represent
characters as integers, instead of separating them semantically (this gray
does not exist in Java, where bytes and chars are separate datatypes, and
the implicit numeric promotion is forbidden for chars: typecasting a char to
integer type explicitly is required, even if Java still allows chars to be
as numeric with a defined but limited arithmetic on them).

I just think that it's a shame that the legacy usage of char as meaning a
byte in
C/C++ was an initial design error, but we have to live with it, due to the
amount of programs that have been written assuming it. But this is still in
conformance with the initial design of C/C++ for performance, where a byte
(as an
integer type) is not even defined to have a defined bitwidth.

This causes problems in systems like 4-bit microcontrolers, where the
addressable (and allocatable) memory unit is the nibble: on them, a C
would have to assume that a char takes two nibbles, and thus two memory
so that an operation like c++ where c is a char would need to increment the
physical memory by 2: this would violate the usage of char as an integer
type, so
instead, the compiler will handle the conversion between integers and char*
a multiplication factor of 2, and differences of char* will include a
division by 2.
The problem with this scheme is that it becomes impossible to address a
memory nibble, except through another compiler-specific native datatype,
than a char, such as __int4 or __nibble.

The same problem occurs on systems where the memory or I/O space is
by 1-bit units: to support these systems (most often microcontrolers), the C
compiler needs to add support for a __bit datatype, and to handle the
between char* and __bit* pointers, notably when computing pointer

Whatever you think, all this should have been defined more precisely in
standards, by designing two separate sets of datatypes, requiring explicit
rather than implicit conversions and promotion between them:

1) one set bound for performance or system integration, which maps
addressable memory units, but without any requirement about the support
range, including the supported native floating point numbers (with their
full value
range and precision even if it is a superset or subset of the standard IEEE
for now C and C++ only define (though not completely) this set of datatypes,
various portability issues (the standard C datatype 'char' is among them,
and also
shamely the ANSI 'wchar_t' datatype).

2) one set bound for semantics, which maps enough addressable memory units
to support the standard ranges, and in which a "character" datatype (as
by Unicode) could be designed, as well as the standard IEEE floating point
numbers, and all their expected values so that it becomes portable across
systems; Java only includes this set of datatypes, but most C/C++ compilers
come now with a set of include header mapping these standard types in terms
of native datatypes.

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST