Re: Plain text: Amendment 1

From: Kermit Software Support ([email protected])
Date: Sun Jul 04 1999 - 19:18:15 EDT

Next message: John Cowan: "Re: dotless j"
Previous message: John Cowan: "Re: dotless j"
In reply to: [email protected]: "Re: Plain text: Amendment 1"
Next in thread: Jonathan Coxhead: "Re: Plain text: Amendment 1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Keld wrote:
>
> Frank Wrote:
> >
> > 4. Paragraph breaks are indicated by two successive Line Separators
> > or by Paragraph Separator, U+2029.
> >
> > Change (4) to:
> >
> > 4. Paragraph breaks are indicated by Paragraph Separator, U+2029.
> >
> > Add to (3):
> >
> > A blank line is indicated by two successive Line Separators.
> > Two blank lines are indicated by three of them, etc.
> >
> > This is to allow paragraphs like this one, which contain embedded
> > "displays" set off by blank lines that are NOT paragraph separators.
>
> could one not use C0 or C1 characters for these, so that the conventions
> could equally apply to say 8859 character sets?
>
They could be, but I think we want to standardize on true Unicode characters
whenever we can, since we have the power to define their semantics. The C0
and C1 sets are included for compatibility with existing sets over which the
Unicode Consortium has no control, and over which we have been haggling the
past few days ("the Mac does this, the PC does that, UNIX does something
else"...)

Anyway, we can't go back and change existing Latin-Alphabet or PC Code Page
files to use consistent record formats -- that's an operating system and
programming language issue, not to mention a conversion task that not even
Hercules (or Xena) could handle.

> 3) could be something like one out of 3:
>
> 1. CR
> 2. LF
> 3. CR LF
>
This is exactly why we should use LS rather than any of the above in
Unicode text. Then converting existing 8-bit text to Unicode will have
the happy by-product of erasing these differences.

As noted previously, I would not object to adding two more "control
characters" to Unicode to remove our dependence on C0 and C1 completely:

1. UHT "Unicode Horizontal Tab", which is just like C0 HT except that
    the tabstops are well-defined (should the tabbing concept be
    carried forward into Unicode Plain Text, rather than using only
    spaces). How to define them is, of course, another question.

2. UFF "Unicode Form Feed", like C0 Formfeed, except not in C0.

I can't think of any applications for C0 Form Feed other than page feed
or page eject, or the analogous action on video terminals, namely clear
screen. But I'm sure that C0 FF has been misused in ways I never heard
of and therefore a more clearly defined Unicode version might be warranted.

However, I'm perfectly happy to stick with C0 HT and FF as long as they
are given precise definitions for Unicode Plain Text, and nobody says
"legacy" when referring to them :-)

Whatever is chosen, let's keep it simple.

- Frank

Next message: John Cowan: "Re: dotless j"
Previous message: John Cowan: "Re: dotless j"
In reply to: [email protected]: "Re: Plain text: Amendment 1"
Next in thread: Jonathan Coxhead: "Re: Plain text: Amendment 1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT