Comments on Public Review Issues (August 10, 2005 - October 28, 2005)

The sections below contain comments received on the open Public Review Issues as of October 28, 2005, since the previous cumulative document was issued prior to UTC #104 (August 2005).

Closed Issue 57: Changes to Bidi categories of some characters used with Mathematics

Date/Time: Sat Oct 8 16:04:32 CST 2005
Contact: uriber@gmail.com
Name: Uri Bernstein

I posted regarding this issue to the public Unicode mailing list, but after not receiving any response, I was advised to report it here as well.

I recently became aware of some changes made in Unicode 4.1.0 to the bidi categories of some characters, most noticeably U+2212 MINUS SIGN. These changes were presented as PRI 57, and approved on 2005-02-14.

Unfortunately, this change made it impossible to embed negative numbers (such as "-5") in right-to-left text, without resorting to bidi control characters. After the bidi category of U+002D HYPHEN-MINUS was changed (between 4.0 and 4.0.1), that character could no longer be used as a unary minus in RTL text (as it would appear to the right of the following number, instead of to its left). The solution was using the MINUS SIGN character instead. But with this recent change, this no longer works.

I would like to point out that referring to negative numbers in RTL text is quite common even outside the realm of mathematics, such as in weather forecasts.

Similarly, the changes made to U+207A SUPERSCRIPT PLUS SIGN and U+207B SUPERSCRIPT MINUS prevent their usage in RTL text (without the aid of bidi control characters) for chemical notations such as OH⁻and H⁺.

I would like to know what was the reasoning behind these changes. I would also like to know what is currently the recommended way of embedding a negative number (or a chemical ion symbol) in RTL text.

74 Change to Default Localization for NaN in CLDR

(Feedback for this issue nominally goes to CLDR, but one report arrived via the reporting form this period.)

Date/Time: Sat Oct 8 04:22:48 CST 2005
Contact: WOverington@ngo.globalnet.co.uk
Name: William Overington

Public Review Issue 74 Change to Default Localization for NaN in CLDR


There has been a request to change the default localization for a NaN from the character U+FFFD (?) REPLACEMENT CHARACTER to another representation. The NaN floating-point value means "Not a Number", and represents an undefined result of a mathematical operation such as (0 ÷ 0) or (8 - 8). Unfortunately, there is no generally accepted mathematical symbol for NaN (e.g., from the American Mathematical Society). The character currently used as the default (root) localization follows Java usage, where it was originally chosen because it is a symbol (thus not an English-specific abbreviation), and has a sense that roughly corresponds to NaN. The CLDR technical committee is somewhat reluctant to make a change, given that this has been in use in Java for many years. If there is a change, possibilities are to revert to the English abbreviation "NaN" or to chose another character such as U+26A0 (?) WARNING SIGN. The committee would appreciate comments on this issue.

end quote

My Quest text font has had a distinctive glyph for U+FFFD for a long time.

I wonder if the request arises from many fonts not having a distinctive glyph for U+FFFD and thus returning the notdef glyph or nothing at all.

I am wondering whether the solution to the request might best be to leave the default localization as it is and to ask generally that fontmakers try to ensure that a distinctive glyph is encoded for U+FFFD in those of their fonts which they consider might be used for outputting the results of a calculation, whether that calculation is using Java or otherwise.

Quest text is available as a free download from the following web page.


William Overington

8 October 2005

75 Proposed Update UTR #25, Unicode Support for Mathematics

Date/Time: Fri Aug 26 07:22:12 CDT 2005
Contact: dominique.couturier@siemens.com
Name: Dominique COUTURIER

I refer to UTR #25 Unicode Support for Mathematics. Date 2005-08-17. Version http://www.unicode.org/reports/tr25-7.html. Revision7 (7d5).

Please consider the following typos.

Section 2.3. Sub-section "Typestyle for Script Characters". 2nd paragraph. Last sentence. The code for MATHEMATICAL SCRIPT CAPITAL P is U+1D4AB not U+1D4A8.

Section 2.3.2. 2nd paragraph. Last sentence. "Using U+2275 or U+2276 followed by U+20D2" should read "Using U+2276 or U+2277 followed by U+20D2" (because U+2275 is the code for NEITHER GREATER-THAN NOR EQUIVALENT TO)

Section 2.3.2. 3rd paragraph. First sentence. "Via combination of U+2275 or U+2276 with U+20D2" should read "Via combination of U+2276 or U+2277 with U+20D2" (same reason as above).

Section 2.5. 3rd paragraph. First sentence. "The left and right angle brackets at U+2328 and U+2329" should read "The left and right angle brackets at U+2329 and U+232A" (because U+2328 is the code for KEYBOARD)

Section 2.13. Table 2.5. Row "Left parenthesis". Column "3-row". "239B, 239D" should read "239B, 239C, 239D". (add 239C: LEFT PARENTHESIS EXTENSION)

Section 2.17. Table 2.6. Row (Std Symbol)"2278". Column "Alternate symbol". "2278, 20D2" should read "2276, 20D2". (because the code for LESS-THAN OR GREATER-THAN is U+2276)

Section 2.17. Table 2.6. Row (Std Symbol)"2279". Column "Alternate symbol". "2279, 20D2" should read "2277, 20D2".(because the code for GREATER-THAN OR LESS-THAN is U+2277)

Best regards.
D. Couturier

Date/Time: Sat Sep 17 21:31:22 CDT 2005
Contact: unicore@geez.org
Name: Daniel Yacob


I think it would be useful to have identifiers defined for numeral systems. Analagous to ISO 15924 for scripts. The application would be primarily for text formatting where the numeral system would be selected from a menu for page numbers, chapters and section numbers, calendar years, ordered list markers, and clocks.

Considering that it could be a difficult and lengthy process to convince the ISO to set a standard for numeral system identifiers, defining them under TR 25 may be a beter option.

Example systems that would be defined: decimal, hexadecimal-lower, hexadecimal-upper, octol, binary, roman-upper, roman-lower, aegan, coptic, ethiopic, hebrew, arabic, farsi, etc..

76 Draft UTS #37, Ideographic Variation Database

Date/Time: Sat Oct 8 03:56:31 CST 2005
Contact: WOverington@ngo.globalnet.co.uk
Name: William Overington

Public Review Issue 76 Proposed Draft UTs #37, Ideographic Variation Database In the web page http://www.unicode.org/reports/tr37/tr37-1.html is the following.


[Ed note, from Rick: But (a point of interest to some) neither the registry nor the registrar guarantees the perpetual availability of the registered entities. Thus, it is possible at some future time for the definitions pointed to by the registry become unavailable for query or use.

(To me as an end-user, this would be a fatal flaw in the usefulness of the registry. I could have documents containing these sequences, and yet at some future time, not be guaranteed ever to be able to find out what they mean. This contrasts with the standard itself, in which one can always find out the contents by having a copy of the standard. Somehow, users of this registry should be cautioned in this.]

end quote

This problem could perhaps be overcome by desktop publishing the contents in text form from time to time and binding the pages into a small book in a small edition, say ten copies or whatever.

Copies could be sent to the Library of Congress in the United States of America and to the British Library in England and to the other Legal Deposit Libraries in the United Kingdom and the Irish Republic.

Thus the information would be archived and be available for reference for as long as civilization exits.

The following web page may be of interest.


In particular, the section "Definition of a publisher".

I feel that it is important to emphasise that, in United Kingdom law, a publisher can be an individual and a publication need not be in a large edition and need not be by way of trade.

Certainly, many published books are by large companies in large editions by way of trade, yet the British Library also collects items such as examples of books produced in small editions by hobbyist private presses.

I do not know what is the law on this matter in the United States of America or in other places outside the United Kingdom, yet it is possible that the great libraries of the world could ensure that the information is available for as long as those libraries exist if the registry being here considered publishes its information.

It is possible that the information need not necessarily be published in hardcopy format. It may be that a pdf (Portable Document Format) publication could be deposited and that that would be a satisfactory arrangement.

I hope that this helps.

William Overington

Date/Time: Mon Oct 10 17:39:05 CST 2005
Contact: markus.icu@gmail.com
Name: Markus Scherer

There are 256 variation selectors, not 240. PDUTS #37 seems to systematically omit the FE00..FE0F. Why? Are they not allowed in IVS, or not in registered IVS, other than if UTC registers them?

"...and registered IVSes should be should be used only to restrict the rendering..." - remove one "should be "

"There is no guarantee that two IVSes using the same variation selector but on different unified ideographs have any relationship, ..." - I would add examples: e.g., equality or non-equality.

"The usefulness of a given variation sequence, and the usefulness of a collection as a whole depends..." - comma after "whole" to complete the bracketing starting after "sequence"?

"Both files are encoded in UTF-8, using U+000A as the line separator." - Why not allow several or all of the Unicode newline sequences? If someone downloads the data file to their local machine, it may not be U+000A (alone) any more.

"The identifiers for collections..." may contain an underscore, but the example uses a dash. Please make consistent. Why not allow both underscore and dash?

Suggestion/for discussion: Possible to add that over time a collection's regular expression may be changed, provided that the new regex is a superset of the previous one.

4.1 Registration of a new collection - should the official start of the 90-day period not be triggered by submitting to reporting.html and the UTC acknowledging it with a posting/forwarding to the public list? (Rather than only posting to the public list.)

Should IVD_Collections.txt have a field for a collection representative/owner? (E.g., email address)

"...displayed with an older form (4) of the glyph" -> "...displayed with an older glyph (4)"?

"82A6 E0134; Examples-names; 23" - remove the ending 's' from the collection name

77 Proposed Draft UTS #39 and Proposed Update UTR #36

No feedback was received via the reporting form this period. See other L2/UTC documents for detailed discussions.

79 Proposed Updates to UAX #29: Text Boundaries and UAX #31: Identifier and Pattern Syntax

Date/Time: Sat Oct 22 13:03:15 CST 2005
Contact: jkorpela@cs.tut.fi
Name: Jukka K. Korpela

In UAX #31, at http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax the table explains the "General Description of Coverage" of both ID_Start and ID_Continue in a way that excludes ideographs. Ideographs are however mentioned in the text before the table, and the Unicode database defines them as ID_Start and ID_Continue characters.

80 Proposed Update to UAX #9: The Bidirectional Algorithm

Date/Time: Fri Oct 14 15:11:56 CST 2005
Contact: paulnel@microsoft.com
Name: Paul Nelson

The statement that is made as an aside, "(Otherwise all characters with the Bidi_Mirrored property must be mirrored, and all other characters must not be mirrored.)" seems to be a little too perscriptive and too vague.

If the bidi properties are changed on a character it could theoretically break document layout and fonts that were made to deal with an incorrect assignment of the bidi mirroring property for that letter.

It is interesting that the title is "Tailor", but the word "must" is used in the text. Is the purpose to limit tailoring to mathematical characters?

Date/Time: Fri Oct 14 18:51:13 CST 2005
Contact: Laila@apple.com
Name: Laila El-Dafashy

I agree with the proposed change that all characters with the Bidi_Mirrored property must be mirrored to ensure that the correct character code is used to express the intended semantic of the character, with the exception for mathematical characters that are part of mathematical expressions or formulae, a higher level protocol can limit tailor mirroring action to a subset of those with the mirroring property. This proposed change will need a major revision to the Fonts.

Date/Time: Thu Oct 27 01:08:34 CST 2005
Contact: matial@il.ibm.com
Name: Matitiahu Allouche

I have two comments about the proposed changes to the text about mirroring.

1) The updated text of L4 says: "L4. A character that possesses the Bidi_Mirrored property as specified by Section 4.7, Mirrored of [Unicode] must be depicted by a mirrored glyph if the resolved directionality of that character is R."

Comment: although fairly obvious for cognoscenti, the term "resolved directionality" has not been formally defined, to the best of my knowledge. It seems to appear nowhere else in UAX #9. Additionally, the example in the next paragraph refers to "odd resolved level". I suggest to unify around the better defined term of "odd resolved level".

2) In HL6, the new text says: "(Otherwise all characters with the Bidi_Mirrored property must be mirrored, and all other characters must not be mirrored.)" Comment: in order to avoid any misunderstanding, I suggest to modify as follows: (Otherwise all characters with the Bidi_Mirrored property and with an odd resolved level must be mirrored, and all other characters must not be mirrored.)

81 Proposed Update to UAX #34: Unicode Named Character Sequences

(The issue was posted very late; no feedback was received this reporting period.)