From: Mark Davis [markdavis@ispchannel.com]
Sent: Thursday, August 03, 2000 11:56 AM
To: Multiple Recipients of Unicore
Subject: Re: UTC Agenda item: Mathematical Letter Symbols

I am concerned about the math clone characters. During the long
discussions over the years with representative of the match community,
these characters were sold to us on the basis that they were required
in plain-text processing. On that basis, the UTC advanced them to the
next level, and they are now a part of the current FCD 10646-1. Cf.
http://www.unicode.org/unicode/members/L2000/n3442/02n34421_pi-38.pdf

For reasons mentioned elsewhere, they have the opportunity to cause
not only considerable confusion among users, but problems for software
processes, and security risks in terms of spoofing. They are all
identical in appearance with normal letters and numbers under some
choice of style or font, e.g.

1D680 MATHEMATICAL MONOWIDTH CAPITAL Q
1D7E2 MATHEMATICAL SANS DIGIT 0

Although intended for math implementations, these characters will
clearly leak into normal environments. If these character are to be in
Unicode, then our goal must be to make sure that they are useful in
their intended implementation context, but limit the damage that they
can do elsewhere.

One of the tools we have to address that is to give them the correct
properties to reflect their real status as symbols, not as letters or
numbers. That is, assign them as So (Symbol, Other), with no numeric
value, no case property, no case mapping.

In other words, don't give them properties like letters or digits,
such as:

0051;LATIN CAPITAL LETTER Q;Lu;0;L;;;;;N;;;;0071;
0030;DIGIT ZERO;Nd;0;EN;;0;0;0;N;;;;;
etc.

instead give them properties like other symbols:

2118;SCRIPT CAPITAL P;So;0;ON;;;;;N;SCRIPT P;;;;
235C;APL FUNCTIONAL SYMBOL CIRCLE UNDERBAR;So;0;L;;;;;N;;;;;
etc.

In particular, assigning them the value 'So' will cause them not to be
included in the recommended programming identifier syntax. I strongly
feel that this is the correct way to go. We don't want to have these
clones, with all their possibilities for spoofing, to occur in
programming identifiers, XML tag names, and Java class file names,
etc. (Note that Java class names -- identifiers -- are mirrored in the
file name for both the source and binary.)

Math equations will have their own rules for identifiers; those should
not be confused with the standard recommendations for normal text
processing. As Murray points out, "...the characters are separate
symbols, e.g., they don't get grouped into natural language words"
(unicode@unicode.org Mon, 17 Jul 2000)

These characters should also not have case mappings -- where
characters are treated as math symbols, case is not just a minor
variation, they change meaning when they change case.

I realize quite well that this approach changes the direction that we
had been following with regard to the letter-like symbols, but we have
*not* had complete copies of alphabets before, so what was a small
cyst has the prospect of becoming a malignant tumor. (Ok, the language
is a bit overblown, but you get my point).


Now there is a complication: what to do about the current letter-like
symbols, such as:

2112;SCRIPT CAPITAL L;Lu;0;L;<font> 004C;;;;N;SCRIPT L;;;;
2118;SCRIPT CAPITAL P;So;0;ON;;;;;N;SCRIPT P;;;;

This issue is important, because these letters are used to 'fill in'
holes in the new allocations.

1D454 MATHEMATICAL ITALIC SMALL G
1D455 (This position shall not be used)
1D456 MATHEMATICAL ITALIC SMALL I

Instead of 1D455, one is to use (I believe) the currently letterlike
italic small h:

210E;PLANCK CONSTANT;Ll;0;L;<font> 0068;;;;N;;;;;

Luckily, these characters are not in frequent use, so if we need to
change their properties at this point for consistency, we have a
certain degree of freedom. (This would also help to resolve some
anomalies in having characters with case, but no case mappings:
http://www.unicode.org/unicode/reports/tr21/charts/CaseChart7.html.)


     I am sympathetic for Ken's call to arms to more closely
     control the properties for Unicode characters, and in
     particular to make all the general category properties
     normative. (Cf.
     http://www.unicode.org/Public/UNIDATA/UnicodeData.html).

     Were it not for the looming prospect of the full set of math
     clones, I would say just let sleeping dogs lie. However, we
     are faced with that situation, and need to consider all the
     ramifications. We can't lock the barn before making sure
     that the horses are in their stalls. (ok, mixing metaphors)
     Once we fix this issue, then I think we are ready to take
     the step of making all the general category properties
     normative.

To recapitulate, we are faced with two main choices for the math
clones:

1. Make the math clones symbols.
1.a. and revise the properties for the 'filler' letter-like symbols
for consistency.
1.b. and leave the letter-like symbols as is, accept the
inconsistency.
1.c. and leave the letter-like symbols as is, fill in the holes such
as 1D455.

2. Make the math clones like the current letter-like symbols.

To limit the damage that these characters do, I strongly feel that we
should choose #1. I have my favorite among 1a, 1b, and 1c, but any
would be better than #2.

Mark