Re: Backslash n [OT] was Line Separator and Paragraph Separator

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Oct 22 2003 - 15:33:14 CST


From: <jon@hackcraft.net>
> However because the universal-character-name escapes (\uXXXX and
\UXXXXXXXX)
> are defined relative to a particular encoding, namely ISO 10646, it would
be an
> error if ('\n' != '\u000A' || '\r' != '\u000D'). Whether this is
implemented by
> using the values 0x0A and 0x0D for LF and CR respectivley (e.g. by using
US-
> ASCII or a proper superset of US-ASCII such as Unicode) or by converting
those
> values to another encoding when parsing isn't specified.

You're wrong here:
Neither Unicode or ISO specify that the source constants '\n' or '\r', which
are made with an escaping mechanism of _multiple_ distinct characters
specific for some programming languages must be bound at compile-time or
run-time to a LF or CR character.

The '\n' and '\r' conventions are specific to each language, and C/C++ use
conventions distinct from those in Java for example... This is not an
encoding issue, but a language feature.

In C or C++, if you want to be sure that your program will be portable when
you need to specify LF or CR exclusively, you MUST NOT use the '\n' and '\r'
constants but instead the numeric escapes in strings (i.e. "\012" or "\x0A"
for LF, and "\015" or "\x0D" for CR), or simply the integer constants for
the char, int, or wchar_t datatypes (i.e. 10 or 012 or 0x0A for LF, and 13
or 015 or 0x0D for CR), and make sure that your run-time library will map
these values correctly with your run-time locale or system environment (you
may need to specify file-open flags to control this mapping, such as the "t"
flag for fopen function calls).

So a test like: "if ('\n'==10)" may or may not be true in C/C++, depending
on the compiler implementation (but not of the system platform...), and the
same test in Java will always be true...



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST