From: Mark Davis [markdavis@ispchannel.com] Sent: Thursday, August 03, 2000 11:56 AM To: Multiple Recipients of Unicore Subject: Re: UTC Agenda item: Mathematical Letter Symbols I am concerned about the math clone characters. During the long discussions over the years with representative of the match community, these characters were sold to us on the basis that they were required in plain-text processing. On that basis, the UTC advanced them to the next level, and they are now a part of the current FCD 10646-1. Cf. http://www.unicode.org/unicode/members/L2000/n3442/02n34421_pi-38.pdf For reasons mentioned elsewhere, they have the opportunity to cause not only considerable confusion among users, but problems for software processes, and security risks in terms of spoofing. They are all identical in appearance with normal letters and numbers under some choice of style or font, e.g. 1D680 MATHEMATICAL MONOWIDTH CAPITAL Q 1D7E2 MATHEMATICAL SANS DIGIT 0 Although intended for math implementations, these characters will clearly leak into normal environments. If these character are to be in Unicode, then our goal must be to make sure that they are useful in their intended implementation context, but limit the damage that they can do elsewhere. One of the tools we have to address that is to give them the correct properties to reflect their real status as symbols, not as letters or numbers. That is, assign them as So (Symbol, Other), with no numeric value, no case property, no case mapping. In other words, don't give them properties like letters or digits, such as: 0051;LATIN CAPITAL LETTER Q;Lu;0;L;;;;;N;;;;0071; 0030;DIGIT ZERO;Nd;0;EN;;0;0;0;N;;;;; etc. instead give them properties like other symbols: 2118;SCRIPT CAPITAL P;So;0;ON;;;;;N;SCRIPT P;;;; 235C;APL FUNCTIONAL SYMBOL CIRCLE UNDERBAR;So;0;L;;;;;N;;;;; etc. In particular, assigning them the value 'So' will cause them not to be included in the recommended programming identifier syntax. I strongly feel that this is the correct way to go. We don't want to have these clones, with all their possibilities for spoofing, to occur in programming identifiers, XML tag names, and Java class file names, etc. (Note that Java class names -- identifiers -- are mirrored in the file name for both the source and binary.) Math equations will have their own rules for identifiers; those should not be confused with the standard recommendations for normal text processing. As Murray points out, "...the characters are separate symbols, e.g., they don't get grouped into natural language words" (unicode@unicode.org Mon, 17 Jul 2000) These characters should also not have case mappings -- where characters are treated as math symbols, case is not just a minor variation, they change meaning when they change case. I realize quite well that this approach changes the direction that we had been following with regard to the letter-like symbols, but we have *not* had complete copies of alphabets before, so what was a small cyst has the prospect of becoming a malignant tumor. (Ok, the language is a bit overblown, but you get my point). Now there is a complication: what to do about the current letter-like symbols, such as: 2112;SCRIPT CAPITAL L;Lu;0;L; 004C;;;;N;SCRIPT L;;;; 2118;SCRIPT CAPITAL P;So;0;ON;;;;;N;SCRIPT P;;;; This issue is important, because these letters are used to 'fill in' holes in the new allocations. 1D454 MATHEMATICAL ITALIC SMALL G 1D455 (This position shall not be used) 1D456 MATHEMATICAL ITALIC SMALL I Instead of 1D455, one is to use (I believe) the currently letterlike italic small h: 210E;PLANCK CONSTANT;Ll;0;L; 0068;;;;N;;;;; Luckily, these characters are not in frequent use, so if we need to change their properties at this point for consistency, we have a certain degree of freedom. (This would also help to resolve some anomalies in having characters with case, but no case mappings: http://www.unicode.org/unicode/reports/tr21/charts/CaseChart7.html.) I am sympathetic for Ken's call to arms to more closely control the properties for Unicode characters, and in particular to make all the general category properties normative. (Cf. http://www.unicode.org/Public/UNIDATA/UnicodeData.html). Were it not for the looming prospect of the full set of math clones, I would say just let sleeping dogs lie. However, we are faced with that situation, and need to consider all the ramifications. We can't lock the barn before making sure that the horses are in their stalls. (ok, mixing metaphors) Once we fix this issue, then I think we are ready to take the step of making all the general category properties normative. To recapitulate, we are faced with two main choices for the math clones: 1. Make the math clones symbols. 1.a. and revise the properties for the 'filler' letter-like symbols for consistency. 1.b. and leave the letter-like symbols as is, accept the inconsistency. 1.c. and leave the letter-like symbols as is, fill in the holes such as 1D455. 2. Make the math clones like the current letter-like symbols. To limit the damage that these characters do, I strongly feel that we should choose #1. I have my favorite among 1a, 1b, and 1c, but any would be better than #2. Mark