RE: Line Separator and Paragraph Separator

From: Jill Ramonsky (Jill.Ramonsky@aculab.com)
Date: Tue Oct 21 2003 - 06:05:09 CST


Interesting.

I do strongly suspect, however, that at least part of the reason that LS
and PS didn't take off was that they are more than seven bits wide, and
hence cannot be transported in plain ASCII text.

I wonder why it was not felt a good idea at the time (the early 1990s)
to have defined LS and PS, but with codepoints somewhere in the range
U+00 to U+1F. I think it would have been fairly easy to find some mostly
unused ones, for example U+10 and U+11. The reason? SMTP traffic is (by
definition) transmitted across 7-bit-wide channels. HTTP traffic is
transmitted across 8-bit wide channels. In the internet world, "newline"
is CRLF, and everything else has to be converted to it for transmission
across the internet.

Personally, I would have added a THIRD kind of separator, a "soft line
break". The reason? Some email relays insist on a "maximum line length"
of emails. In these days of mime types and attachments, we inject CRLF
into the files to keep such relays happy, but the renderer ignores them
as "just whitespace". If we'd have had a "soft line break" character (in
the range U+00 to U+1F), we could have retrofitted it into existing
email protocols. Had we done this, SLB could have been considered "just
whitespace", while LS and PS would have been not-ignorable in HTML (and
in fact, equivalent to <br> and <p> respectively).

I'm not surprised that NEL never caught on though.

Jill

> -----Original Message-----
> From: Frank da Cruz [mailto:fdc@columbia.edu]
> Sent: Monday, October 20, 2003 4:53 PM
> To: Jill Ramonsky
> Cc: unicode@unicode.org
> Subject: Re: Line Separator and Paragraph Separator
>
>
> At some point in the early 1990s, the thinking was that ASCII control
> characters were included in Unicode only for round-trip compatibility
> with existing character sets, but their semantics were
> undefined, and anyway
> they were not needed since they were from the bygone days of
> terminals and
> similar antique contraptions, whereas in modern times all
> text is "flowed"
> by "smart rendering engines".
>
> Ten years hence, the terminal-to-host model is still widely
> used, as is text
> with hard line breaks, but to convince the skeptics and
> ultra-modernists
> that line breaks were still a useful concept, I mentioned
> line-oriented
> programming languages (such as Fortran), and poetry. Hence the line
> separator.
>
> Later everybody realized you couldn't stamp out ASCII control
> characters,
> so we're still using them; LS and PS never caught on as far as I know.
> Although obviously, LS would have been an improvement over
> the existing
> situation, in which different line separators (CR, LF, CRLF) are used
> on different platforms, which would otherwise have compatible text
> record formats, which to this day causes no end of confusion.
>
> At some point after Unicode 2.0, the C1 controls were adopted
> from ISO 6429,
> in which we have a Next Line control (NEL, U+0085), which
> might also have
> served the purpose, but it never caught on either.
>
> - Frank
>



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST