L2/00-249

**From:** Karlsson Kent -
keka [keka@im.se]

**Sent:** Thursday, August 03, 2000 6:49 AM

**To:** Multiple Recipients of Unicore

**Subject:** RE: UTC Agenda item: Mathematical Letter Symbols

Regarding the "math alphanumeric
characters" proposal

-----------------------------------------------------------------------------

I've finally got some time to comment on this
issue. I've been too busy

editing a somewhat math oriented document which
does do distinctions

between upright non-bold, bold, and italic
versions of the same sequence

of letters, as well as between bold and non-bold
versions of the same

symbols (for plus, minus, and infinity, as it
happens). It also uses

multi-letter identifiers in math
expression. That the identifiers are multi-

letter is important. The document would be
unreadable if single-letter

identifiers had been used throughout.

I'm very strongly opposed to the "math
alphanumeric characters" proposal.

As someone that would be a 'user' of the
"math alphanumeric characters" if they

were to be accepted and then used in e.g.
MathML, I very much fear the problems

that will result: problems setting/changing
variety, problems with searches,

problems getting the desired identifiers in the
desired variety. E.g., I might

not be able to get a bold "oändlig",
or at least have severe problems in finding

and using a work-around. This is not an
unrealistic example, the document

I've been busy with has the bold identifier
"infinitary" (in math expressions!).

If I were to translate the document to Swedish,
that would be a bold "oändlig".

And the "math alphanumeric characters"
do not allow me to write that!

Character properties

The 'math alphanumeric characters' are not
symbols any more than an

ordinary letter is a symbol. So these
characters, if adopted (which they definitely

should NOT be), should unequivocally be given
the general categories Lu, Ll, and

Nd as appropriate, with compatibility
(<font>) mappings to the ordinary letters

and digits. Notice that even the proponents of
these "math alphanumeric characters"

seem to propose to use the ordinary letters and
digits in math expressions too

(though it is not entirely clear for
exactly what; upright non-bold letters and digits?).

Notice that (Latin, Greek) letters in math
expressions are most commonly

italic. The non-italic letters in math
expressions are much more of an exception.

That is why (La)TeX by default makes letters in
math expressions in italic.

Alleged added mark-up verbosity

The only "hard and fast" argument
for including these "math

alphanumeric characters" appears to be to
"save some bandwidth" in that using

mark-up instead would be more verbose.
This is, however, 100% false. If the

mark-up scheme is done in any reasonable way,
using mark-up instead is

(marginally) LESS verbose than using these
"math alphanumeric characters".

Example:

"math alphanumeric
characters" (in a MathML setting):

<mi>abc</mi>
(upright non-bold???)

<mi>&bolda;&boldb;&boldc;</mi>
(upright bold)

<mi>&fraka;&frakb;&frakc;</mi>
(fraktur)

(etc.
for the less than handful of different varieties)

(one possible, reasonably
done) "mark-up instead" alternative:

<mr>abc</mr>
(upright non-bold)

<mb>abc</mb>
(upright bold)

<mf>abc</mf>
(fraktur)

(etc.
for the less than handful of different varieties)

Shortening the entity names or using the
"math alphanumeric characters"

directly (in UTF-8 or UTF-16), which the
proponents apparently suggest,

is still more verbose than the alternative
mark-up version given here.

There is only a handful of varieties, I'm NOT
suggesting that each and

every font difference counts. (I'm also
avoiding the word "style" since

some people seem to misunderstand what that
would mean.)

Bold (non-alphanumeric) symbols

If "math alphanumeric characters"
are 'needed' because of semantic distinctions

between the few varieties, then all "math
symbols" (category Sm) also need

to be duplicated in bold versions. Is this
the plan? If not, why not? Bold

symbols are sometimes used in a semantically
distinct way relative to the

corresponding non-bold symbol. The reasoning for
both "math alphanumeric

characters" and "bold math
symbols" would be the same, and should be treated

the same way when it comes to encoding
considerations!

Bold symbols are in LaTeX obtained via the
\boldsymbol command, or via the \pmb

command. (\pmb is 'poor mans bold' which
simulates bold by overtyping. Handy

if the bold symbol desired is not available (in
true bold) in the symbol font installed.)

Semantic significance

The different varieties of letters in math,
like italic, bold, fraktur, does signify a

semantic difference, so does bold vs. non-bold
versions of other (Sm) symbols.

This does not mean that this difference need to
be mediated through

different character allocations. Indeed,
MathML makes a semantic

difference between <mi> and <mn>, as
well as a host of other such

differences. There is no reason why
MathML, and similar mark-up schemes,

could not make the difference between, say,
italic and fraktur a mark-up one,

<mi> (italic), <mf> (fraktur).

Math is inherently "non-plain" text

Very little math can be written without
mark-up of some sort. Also Murray's

"plain text math" is a (very own) kind
of mark-up.

Multi-letter identifiers and I18n

Some branches of math, computing science in
particular, use multi-letter identifiers

also in mathematical expressions. If these
are expressed in any other language

than English, making them, e.g. bold, suddenly
needs a different mechanism for

making them so. It is very unlikely that
any systems will handle this gracefully

if they are geared towards using "math
alphanumeric characters". Likewise,

making symbols bold will require a separate
mechanism, unless you plan to

also allocate "bold math symbols" as
separate characters.

Old TeX vs. modern LaTeX

Old TeX used commands like \calE to get a
calligraphic ('script') E. Each available

letter in each available 'math' variety had its
own command. This is very similar

to the "math alphanumeric characters"
proposal.

However, modern LaTeX has abandoned that
approach, and instead use parametric

commands, where the parameter is the letters
(plural!) to be set in a particular

variety. E.g. \mathcal{E} to get a calligraphic
('script') E. This way multi-letter

identifiers can gracefully be handled, and
allows in principle multi-letter identifiers

(in math expressions!) that need not be
derived from *English* words, but

can be from some other language.

LaTeX math identifier 'commands' (cmp. 'mark-up'):

\mathit{abc} Italic
(in principle, default for single-letter identifiers in LaTeX)

\mathbf{abc} Bold identifiers

\mathrm{abc} Upright,
non-bold (typically: "sup", "sin", "lim", ...)

\mathcal{abc}
"Calligraphic"/"Script" identifiers

\mathsf{abc} Sans-serif
identifiers

\mathtt{abc}
"Teletype"/"monospace"/"typewriter" identifiers

\frak{abc} Fraktur
identifiers (amstex package)

\Bbb{abc} Double-struck
(black-board bold) identifiers (amstex package)

There is nothing *in principle* preventing "internationalised" identifiers here.

Note that LaTeX (with amstex package) also has:

\boldsymbol{+} Bold symbols
(incl. sequences of symbols; \boldsymbol{+\inf})

\pmb{+} Fallback for bold
symbols ('poor mans bold'; does overtyping; useful

if
the symbol font does not have the desired symbol(s) in "true" bold)

There is no problem to introduce similar
mark-up distinctions in MathML-ish

schemes, for example like this (just an example
of how it could be done):

<mi>abc</mi> Italic identifiers

<mb>abc</mb> Bold
identifiers

<mr>abc</mr>
Upright, non-bold (typically: "sup", "sin",
"lim", ...)

<mc>abc</mc>
"Calligraphic"/"Script" identifiers

<ms>abc</ms>
Sans-serif identifiers

<mt>abc</mt>
"Teletype"/"monospace"/"typewriter" identifiers

<mf>abc</mf>
Fraktur identifiers

<md>abc</md>
Double-struck (black-board bold) identifiers

<mn>123</mn> Upright
non-bold numerals

<mm>123</mm> Bold
numerals

<ml>123</ml> Italic
numerals

<mo>+</mo> Non-bold
symbols

<mp>+</mp> Bold symbols

There is nothing in principle preventing
"internationalised" identifiers here.

This method does not affect Unicode in any way,
no new characters at all.

But it does allow for 1) internationalised
multi-letter identifiers, and 2)

bold symbols too. And that without any
private use characters, plane 1

characters, and no bold clones of symbols.
It's more general and flexible too.

If mathematics develops so that, say, italic
sans-serif were a new recognised

variety, no new characters need be added, just a
new tag in the mark-up scheme.

Existing "math alpha chars" should NOT be used

The existing "math alphanumeric"
characters (in the BMP) should NOT be used.

In particular not with mark-up schemes that can
(and should) do the distinction

by mark-up (like <mi>i</mi>,
<mc>R</mc>, etc.). That the existing "math

alphanumeric" characters (in the BMP) were
ever encoded should be regarded

as a mistake.

/Kent Karlsson