RE: Definitions (Was: Re: Charsets + encoding + codesets)

From: Murray Sargent (
Date: Fri Oct 10 1997 - 20:43:28 EDT

Unicode considers control characters, except for TAB (U+0009), as beyond
Unicode's scope (see The Unicode Standard, Version 2.0, Sec. 2.6).
Nevertheless, we had a lively discussion on this alias back in May on
the line separator character that included discussion of CRLF, LF, CR,
etc., and I think we came to the conclusion that the next version of
Unicode should include a summary of how these control codes are commonly
used in current software. Has such a summary been written?



> -----Original Message-----
> From: Tony Harminc []
> Sent: Friday, October 10, 1997 3:16 PM
> To: Multiple Recipients of
> Subject: Re: Definitions (Was: Re: Charsets + encoding +
> codesets)
> On 9 Oct 97 at 12:45, John Clews wrote:
> > Why not do, as ISO/IEC 10646 (and all ISO/IEC/JTC1/SC2 standards)
> do,
> > or at worst imply, and use the term "character" for those elements
> > which can be related to a single code point, and "graphic character"
> > for that class of characters which can be related to more than one
> > code point, using any permitted combining options?
> Where does this leave sequences like CR LF that many people would
> think of in at least some contexts as a unit ("line separator" or the
> like) ? Or would you deny that they're "character"s at all ?
> Tony Harminc

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:37 EDT