Re: Plain text: Amendment 1

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Jul 06 1999 - 19:57:50 EDT


John Cowan wrote:

>
> The semantics of CR and LF in Unicode 2.x *are* the ambiguous
> ones inherited from the 7-bit controls; there are no other semantics.
> But this has been changed in Unicode 3.0: see UTR #13
> (http://www.unicode.org/unicode/reports/tr13/), which will be a
> normative part of Unicode 3.0.

This is not the case. UTR #13 *is* to be considered part of the Unicode
Standard, Version 3.0:

http://www.unicode.org/unicode/standard/versions/Unicode3.0-beta.html

However, UTR #13 constitutes "Unicode Newline *Guidelines*" [emphasis
added]. There is no conformance specification and there are no
normative implications. The scope constitutes: "a set of
recommendations for handling these characters so as to minimize the
effects on users." Think of UTR #13 as a late addition to Chapter 5,
Implementation Guidelines, that did not make it into the actual printed
text of The Unicode Standard, Version 3.0, forthcoming.

> Note well that UTR #13 does not
> solely prescribe the semantics of CR and LF during conversion to and
> from Unicode, but also the semantics of CR and LF *in* Unicode.

It makes suggestions. It does not normatively prescribe.

>
> As for HT and FF, nobody uses them incompatibly, and
> introducing new characters for them is supererogation at best.

I would agree with this.

> Mark Davis wrote:
>
> > A lot of the discussion of line termination relates to technical report #13.
> > Any suggestions for additional information for that report would be welcome.
>
> My suggestions:
>
> 1) The NEL character in the C1 set (0x85) is the ISO equivalent of
> EBCDIC NL (0x15) and this mapping is duly given in the EBCDIC code page
> mappings on the Unicode FTP site. The text should therefore advise
> applications to treat U+0085 (NL/NEL) as a newline, not U+0015 (NAK).

This was a typo/oversight in the text of UTR #13 and will be corrected.

>
> 2) There should be a warning that some old documents use bare
> CR (0x0D) to do underlining or other overstriking; an application
> that converts such text should do a more complex conversion, though
> treating bare CR as a NLF is marginally acceptable even for these
> documents (which may then wind up containing occasional lines
> with only spaces and underscores).

This is a good suggestion to add to the text of UTR #13.

--Ken

>
> --
> John Cowan http://www.ccil.org/~cowan cowan@ccil.org
> Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
> Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
> -- Coleridge / Politzer
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT