RE: Line Separator and Paragraph Separator

From: Jill Ramonsky (Jill.Ramonsky@aculab.com)
Date: Tue Oct 21 2003 - 09:12:50 CST


Hmm.

Well, I can't say I've ever found a use for putting either a C0 or a C1
control into a text file, beyond the usual CR, LF and TAB. My code also
often considers FF to be whitespace, although I've never actually
(knowingly) encountered it in a real text file.

I would have thought that low codepoints would be highly valuable
commodities. Though some may have exotic uses, my experience is that
most of them don't seem to be used. In the past (that is, in the
pre-Unicode days, or when specifically working with ASCII or Latin-1
strings), I have tended to treat the control characters rather like the
Private Use Area - a space in which I can do what I want so long as
don't expect the "outside world" to agree. I've even invented (and used)
some 8-bit encodings which leave the whole of Latin-1 unchanged (apart
from the C1s) and use C1 characters a bit like "surrogate pairs" to
reach the rest. (I didn't expect this to catch on, it was for internal
use only).

I'm really surprised that Unicode "didn't want to go there".
Still, that's life.
Jill

> -----Original Message-----
> From: John Cowan [mailto:cowan@mercury.ccil.org]
> Sent: Tuesday, October 21, 2003 1:58 PM
> To: Jill Ramonsky
> Cc: unicode@unicode.org
> Subject: Re: Line Separator and Paragraph Separator
>
>
> Jill Ramonsky scripsit:
>
> > I wonder why it was not felt a good idea at the time (the
> early 1990s)
> > to have defined LS and PS, but with codepoints somewhere in
> the range
> > U+00 to U+1F.
>
> Pretty much because other ISO standards specify the meaning
> of that set,
> and Unicode/ISO 10646 very much didn't want to go there. I
> say "meaning",
> but there are actually multiple possible meanings, though
> most of them are
> fairly consistent.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST