Re: Concerning mathematics (and introducing other issues concerning mark up)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Sat Mar 09 2002 - 15:26:08 EST


An interesting possibility that would solve the problem would be if Unicode
were to have added into it Unicode code points for the arrow parentheses
that are described on pages 29 and 30 of The Bytext Standard, which is
available at the http://www.bytext.org website. The subscript text could
then be entered using ordinary text characters preceded by an ARROW
PARENTHESIS LEFT DOWN and followed by an ARROW PARENTHESIS RIGHT DOWN.

On Saturday 26 January 2002 I posted to this unicode@unicode.org mailing
list a post about Bytext in which one paragraph was as follows.

quote

I downloaded a copy of Mr Miller's pdf format document and spent a short
while looking through it. I found on pages 29 and 30 a section entitled
"Arrow parentheses" which, if I may be permitted to say so, is excellent. I
began to think of the possibilities that the eight characters which Mr
Miller has invented could become widely used as a way to express powers,
subscripts, upper limits of definite integrals and summations and lower
limits of definite integrals and summations in unicode plain text files.
Perhaps Mr Miller might be encouraged and assisted to designate eight code
points of the unicode private use area for their definition as a practical
way to get them available for people to try out experimentally in text
editors and graphical processors, perhaps U+E300 through to U+E307 would be
a good choice, in the order that the glyphs appear on pages 29 and 30 of the
pdf document. I hope that, in due course, arrow parentheses might become
candidates for upgrading to become regular unicode characters, if such an
upgrading of characters that quite specifically would display one way in a
text editor and another way in a graphical display is compatible with the
underlying concepts of unicode. If I understand arrow parentheses
correctly, someone authoring a document that involves a definite integral
would, using the four arrow parentheses that have double arrows, be able to
write with pen upon paper exactly, precisely, his or her equation or
expression in one line of text. This hand written text could then be keyed
into a text editor where the arrow parentheses would appear on the screen as
symbols. However, upon processing through a suitable graphical display
processor, the limits of the definite integral would appear in the correct
place and the arrow parentheses would not show at all on the graphical
display. Thus, an author could write a handwritten manuscript and the
finished equation appear on screen in a professionally laid out standard
mathematical format in a straightforward sequence of events. This could
potentially reduce keying errors and save keying time.

end quote

An unfortunate problem in getting arrow parentheses accepted in Unicode is,
I feel, the traditional approach of separating characters into two
categories, namely plain text and mark up, with Unicode deliberately
avoiding including mark up characters.

I feel that the notion of there being just two categories needs modifying
as, I suggest, it is hindering progress.

Suppose that someone writes a piece of text. Almost always, the typeface in
which the text is set and printed does not affect the meaning of the text.
So, some text usually has the same meaning whether it is printed in the
Arial fount or in the Goudy Text fount. I write "usually" just in case
there is some text that refers to the typeface in which it is set, though
that is just a rare possibility. The choice of typeface is not part of the
meaning of a typical story or of a play or of a technical article or of
whatever, so it seems reasonable that the Unicode standard does not contain
within it characters that designate particular founts.

Yet, with subscripts, superscripts, limits of definite integrals and
summations and set unions the issue is one of meaning. My own view is that
the meaning of a piece of text can be greatly affected by whether a
subscript, superscript, limit of a definite integral and so on is properly
placed, so I suggest that the categorization of potential codes into just
two categories has become obsolete and that at least three categories are
needed. One category is plain linear text, one category is mark up text and
one category is plain nonlinear text. Plain nonlinear text needs, in my
personal view, to be encoded into the Unicode system, as it affects the
meaning of a document rather than simply how it appears artistically and
aesthetically. This is about open standards: by all means leave mark up for
definition by independent individuals and organizations, yet where the issue
is about the meaning of the text, then I feel that the fact that some plain
text is nonlinear needs to be addressed by the Unicode Consortium please.

On the issue of conveying meaning, the usual way to emphasise a word in a
printed document is to use italics. As the use of italics for a word may
lead a reader of a document to form a different understanding of the
document than if the italics had not been used, I feel that signalling the
start and finish of italics should be regarded as plain nonlinear text
rather than mark up. I suppose that if signalling the use of italics were
to be included in Unicode, then signalling the use of bold and bold italics
would also need to be included.

I feel that there is scope to have a small number of such signalling codes
available in Unicode, as otherwise plain text in Unicode cannot convey the
meanings that can be normally conveyed by plain printed text!

I realize that there is great resistance in some quarters to the
introduction of any characters that involve mark up issues. However, in the
interests of developing open standards, perhaps two codes modelled on the
general thrust of the Bytext arrow parentheses could be introduced into
Unicode with names such as CIRCLE PARENTHESIS LEFT and CIRCLE PARENTHESIS
RIGHT with the meanings that text between a CIRCLE PARENTHESIS LEFT and a
CIRCLE PARENTHESIS RIGHT is mark up text that is not defined within the
Unicode Standard. As I have previously suggested some code points in the
private use area in relation to arrow parentheses as a practical way to get
them available for people to try out experimentally in text editors and
graphical processors (and repeated the suggestion in the quoted text above)
perhaps I may suggest, not for the Unicode Consortium to comment on, just as
a matter between interested end users of the Unicode system, that U+E340
could be used for CIRCLE PARENTHESIS LEFT and U+E341 could be used for
CIRCLE PARENTHESIS RIGHT, the symbols being a left parenthesis with a circle
superimposed on the centre of the arc and a right parenthesis with a circle
superimposed on the centre of the arc: the circle would be drawn so as to
enclose approximately one fifth of the length of the arc for each case. The
meanings of the mark up text would be outside Unicode, yet the use of the
circle parentheses would provide a convenient escape mechanism so that mark
up could be included in a plain text file and could be easily ignored by a
plain text processor.

William Overington

9 March 2002



This archive was generated by hypermail 2.1.2 : Sat Mar 09 2002 - 07:39:10 EST