Comments on Public Review Issues
(October 29, 2005 - January 30, 2006)

The sections below contain comments received on the open Public Review Issues as of February 1, 2006, since the previous cumulative document was issued prior to UTC #105 (November 2005).


75 Proposed Update UTR #25, Unicode Support for Mathematics
77 Proposed Draft UTS #39 and Proposed Update UTR #36
80 Proposed Update to UAX #9: The Bidirectional Algorithm
81 Proposed Update to UAX #34: Unicode Named Character Sequences
82 Representation of Gurmukhi Double Vowels
83 Changing Glyph for U+047C/U+047D Cyrillic Omega with Titlo
84 Proposed Update to UAX #29: Text Boundaries
85 Proposed Update to UAX #31: Identifier and Pattern Syntax
86 Proposed Update to UAX #15: Unicode Normalization Forms
87 Proposed Update to UAX #24: Script Names
88 Proposed Update to UAX #14: Line Breaking Properties
Unicode 5.0.0 Beta Feedback

75 Proposed Update UTR #25, Unicode Support for Mathematics

Date/Time: Wed Jan 25 04:29:34 CST 2006
Contact: philip_chastney@yahoo.com
Name: Phil Chastney
Subject: UTR 25 (Unicode and Mathematics), ver 7

good day

I understand it is still possible to submit comments on UTR 25 (Unicode and Mathematics), ver 7 -- I would like to make some observations on the shapes in mathematical fonts, but this online form is not really appropriate because (i) the comments run to about half-a-dozen pages, and (ii) they require use of special font

I was thinking I could get submit this stuff as a PDF with one or more embedded fonts -- would that be OK?

if so, where should I send my document?

if not, is there some other channel of communication I could use?

I realise there are only a very few days remaining before some deadline or other -- I would have submitted something earlier, but I was then under the impression that the deadline had passed

thank you for your time and attention regards . . . /phil chastney

Editor's Note: The document has been submitted as L2/06-034.

77 Proposed Draft UTS #39 and Proposed Update UTR #36

Date/Time: Fri Jan 6 06:05:12 CST 2006
Contact: hotta@jprs.co.jp
Name: Hiro Hotta
Subject: UTR#36 : Default setting for IDNs in User Agents

Comments on UTR#36 :

We should not expect users to change the preferences of the user agents (e.g., browsers) since most users are not accustomed to do such changes. This means it's inevitable for default preferences to be determined reflecting the demands of majority users. If it is impossible to generalize such demands, language-specific demands may be used alternatively. For example, mixed use of Han+Hiragana+Katakana and Latin in one word, which is not allowed in Restriction Level 2, is very common in Japan. Such use is also allowed in trademarks in Japan. Thus, many brands (e.g., names of banks, retail stores, ISPs, ...) have mixed use of Han+Hiragana+Katakana and Latin. This means that Restriction Level 2 significantly ruins the value of Japanese IDNs. I believe some languages other than Japanese have the same problem.

In summary, my specific comment is the following :

In 2.10.3 C.1. Restriction Level 2 is recommended as a default. However, I propose that "Restriction Level 3" be recommended instead. Or, if assertive "Restriction Level 3" is not appropriate, recommendation should be the one that can reflect the situation of each language. One of the ideas is "If user's locale is English (en) or its derivatives (en-*), the default is set as Restriction Level 2. Otherwise, the default is set as Restriction Level 3."

Hiro Hotta, JPRS (.JP registry)

80 Proposed Update to UAX #9: The Bidirectional Algorithm

Date/Time: Tue Dec 13 09:11:07 CST 2005
Contact: roozbeh@farsiweb.info
Name: Roozbeh Pournader
Report Type: Public Review Issue
Subject: Implications of PR#80 for Iranian standards

This is just a notice to notify UTC that a part of the new Iranian standard for keyboard layouts based on Unicode (currently in near-publication status), will be made unusable if the change proposed in PR#80 goes forward.

The Iranian standard, trying to be accurate in describing how should implementations behave, refers to the codepoint used for Unicode characters, assuming they will remain the same to some degree. Specifically, it refers to two Arabic-related characters, U+FD3E and U+FD3F, the ornate parentheses. These were originally not mirrored, but that is suggested to get changed.

We don't really know what we should mention in the text of the standard: If we mention codepoints for the two keys that generate these two characters, suddenly when a computer's text rendering system gets updated to Unicode 5.0, the behaviour of these two keys switches.

We can't either ask keyboard driver writers to use one set of characters or the other if their underlying rendering system is updated to Unicode 5.0 or not, since in most cases, these changes happen independently and a keyboard driver is not aware of the version of the Unicode standard used in the other layers, like the text layout level (which will switch the shapes).

This is specially important because the usual lifetime of these standards is about ten years. The previous revision of the keyboard standard, was dated 1994 (but it is not yet implemented globally).

In practice, changing the bidi mirroring behaviour of these characters are simply going to bring havoc to several users, because of the period of change that is happening. The text rendering level, the keyboard driver level, the bidi-aware text entry level in the applications (like the one Microsoft Word has), the document level (users need to switch to the other character), etc.

High Council of Informatics of Iran (which I represent in the Unicode Consortium), and the High Council of Information Dissemination of Iran (which I am a member of its Technical Council of Persian Language and Script in Computer Environments), both consider the change to be against any kind of stability.

If the issues in PR#80 are deemed necessary by the UTC, HCI asks (and so do I, personally), that a change in mirroring property is NOT applied to characters in possible common use in Persian texts in Iran, namely U+2018, U+2019, U+201C, U+201D, U+FD3E, and U+FD3F.

From: Dominikus Scherkl
Date:2005-12-14 02:15:23 -0800
Subject: RE: Public Review Issues Update: UAX #9 Bidi Algorithm


in UAX #9 Revision 16 Chapter 4.C2 is still a reference to HL6, which should be eliminated.

Best Regards,

-- Dominikus Scherkl
XPaneon Technologies
Tel. 06023/9436-42

Date/Time: Wed Dec 14 06:46:02 CST 2005
Contact: matial@il.ibm.com
Name: Matitiahu Allouche
Subject: PRI #80 - mirroring

1) It might be worth specifying that the ban on overriding bidi mirroring does not concern characters in the PUA.

2) In the updated phrasing of L4:

a. The property is called "Bidi_Mirrored", but in the book (TUS4 p. 101) it is called "Bidi Mirrored" (without the underscore).

b. The title of section 4.7 should be "Bidi Mirrored" (like in the book) or "Bidi_Mirrored" (if it is decided to unify the terminology on this spelling), not ever "Mirrored" only.

3) After the updated text of section 6 "Mirroring" (of UAX#9), the reference to Section 4.3 Higher-Level Protocols is not quite relevant, since HL6 is to be deleted.

On the other hand, if a higher-level protocol is allowed in a mathematical context, maybe HL6 should stay in place, with appropriate qualification that it applies only to mathematical context.

Date/Time: Wed Jan 18 00:47:37 CST 2006
Contact: asmusf@ix.netcom.com
Name: Asmus Freytag

I have placed editorial comments for UAX#9 on [private URL]

Editor's Note: This file has been turned into a L2/UTC document, L2/06-010.

81 Proposed Update to UAX #34: Unicode Named Character Sequences

Date/Time: Mon Dec 12 19:30:07 CST 2005
Contact: sukhjinder_sidhu@hotmail.com
Name: Sukhjinder Sidhu

The Gurmukhi entries in the Provisional Named Sequences are seemingly of no particular relevance or use.


They should be removed because they are incorrectly transliterated (Pari instead of Pairin), inconsistent (why is Pari used and then Half instead of the Punjabi word 'Adha'?) and incomplete (some conjuncts, such as Pairin Haha, Rara have been recognised in the standard for a while now and are not listed).

In addition, conjuncts can take both subjoined and post-base forms, so Half Ya could also be Pairin Ya. It would be inappropriate to use or prefer one form over the other. This was pointed out by me in document L2/05-167 section B1 and since then the two entries have been commented out. This case is not restricted to just Ya.

Date/Time: Thu Jan 26 15:18:01 CST 2006
Contact: rigvinod@gmail.com
Name: Vinod Kumar
Subject: Comments on PRev Issue 81- UAX #34

Naming the sequence of Unicode code points is a first step. An associated code point, or if that is out of question a proxy code point, is needed for realizing the full benefits of treating a sequence of code points as a single unit.

If we consider a single character, say, Devanagari Letter Ka, the Unicode name DEVANAGARI LETTER KA has come from the convention that it is a letter and pronounced as Ka. Unicode has assigned a code point 0915 to it. It is this code assignment that has enabled text to be transmitted, stored and processed in myriad ways. Naming the Devanagari letter Ka as DEVANAGARI LETTER KA would by itself yield little benefit. Similarly, just giving a standard name to a sequence of Unicode points is of little use. Nowhere is text stored as a sequence of names. The need to have a Named Sequence of 'code points' itself shows that all processing of text are on the code points and not on the names.

The criteria for selecting a sequence of characters for naming can be strengthened. The sequence of characters should have an archetypical glyphic form.

It should be possible to assign a real or proxy code point to the archetypical glyph corresponding to the sequence to be named. Moreover, the code point should have a class distinct from the classes of the sequence codes.

Devanagari named sequences
Consonant signs
        = <0930 094d> | CONSONANT
        = CONSONANT | <094d 0930>


1. The named sequence table should not be a single table for all the scripts but a part of the code chart for the script of the named sequence.

2. The nominal shape of the sequence should be specified in the second column. A sequence of characters without an archetypical glyphical shape should not be named.

3. It should be under a particular classification. For Devanagari, it could come under Consonant signs.A sequence should not be named and proxy coded if it does not distinctly differ from its sequence characters.

4. The name of the sequence should be specified in the third column.

5. Each entry should be similar to the format of the other entries in the code chart, except for two changes. 1) The proxy code point for the sequence should be specified in the first column as a string. The string should refer to a reserved (unused) code point in the script range and can be used within the text processing component as the proxy code point for the named sequence. If Unicode relents and assigns these code points to the named sequence then the string can be replaced by the code point and the representation can be used for interchange as well. Till then it should not go out of the processing software. 2) In the line below, the sequence of characters that is named is shown within < >. The sequence can have a context before and/or a context after. The context can be specified as a class or as specific code sequence. The context and the sequence are separated by |.

Vinod Kumar
Project IndiX

82 Representation of Gurmukhi Double Vowels

Date/Time: Mon Dec 12 19:37:46 CST 2005
Contact: sukhjinder_sidhu@hotmail.com
Name: Sukhjinder Sidhu

I recommend the ordering LEFT TOP BOTTOM RIGHT. This seems to be the most logical order for scripts written from left to right, top to bottom. I see no reason why this same principle could not be applied to other Indic scripts.

A top-to-bottom and left-to-right approach is already used on Microsoft Windows.

Telugu: లైౕ ల ె ౖ
Malayalam: റോ റ േ ാ

However, no cases in other Indic scripts seem to have yet used other combinations (e.g. left + bottom).

83 Changing Glyph for U+047C/U+047D Cyrillic Omega with Titlo

Editor's Note: See also documents L2/06-040 and L2/06-042.

Date/Time: Wed Dec 14 02:19:07 CST 2005
Contact: grinchuk@att.net
Name: Mikhail Grinchuk

Regarding issue #83 "Changing Glyph for U+047C/U+047D Cyrillic Omega with Titlo"

The proposition to change the glyph is misleading. Why?

1. In Church Slavonic language, there exists diacritical mark called "titlo".

2. In Church Slavonic language, there exists letter "omega".

3. It is possible to combine omega and titlo, and the result MUST look like one currently shown as glyphs U+047C/U+047D

4. But this combination is probably very rare, maybe completely unused. Theoretically, it may represent numeral value of 800, but in practice typographers use letter "ot" (U+047E/U+047F), with or without adding an extra titlo over it.

5. In Church Slavonic language, there exists a special letterform to be used for "o" in two words only: these are exclamations "o!" and "ole!" This is the glyph proposed in issue #83 as a replacement for U+047C/U+047D

6. But it is NOT "omega with titlo"!!! The pair of diacritical signs over the omega has nothing common with titlo!

The Slavonic diacritical sign "titlo" plays two roles:

(a) it is the mark of abbreviation ("dv~a" = "deva" [Virgin], "gd~i" = "Gospodi!" [O Lord!], "mr~ia" = "Maria", etc.),

(b) it is used to denote numbers ("to~a" = 371, because "t~" = 300, "o~" = 70, and "a~" = 1).

In the case of exclamations "o!" and "ole!", none of above-mentioned two reasons to use the term "titlo" can be applied: neither as abbreviations, nor for numbers.

7. The correct name of diacritical mark above the 1st letter (omega) in Church Slavonic words "o!" and "ole!" is "velikiy apostrof" ("the great apostrophe"), see, for example, pp. 275, 355, 357, 383, 405, 438, 638 in Vatroslav Jagic's book "Codex Slovenicus Rerum Grammaticarum", or "Rassuzhdeniya Yuzhnoslavyanskoy i Russkoy Stariny o Tserkovno-Slavyanskom Yazyke", Berlin, 1896 (reprint: Muenchen, Wilhelm Fink Verlag, 1968, as Vol. 25 in the series "Slavische Propylaeen"). Jagic reproduces numerous medieval Slavonic (both of South Slave and Russian origin) treaties on alphabet, writing, grammar and similar subjects.

8. The origin of Slavonic "great apostrophe" is Greek combination U+1FCF of two diacritical signs: a weak aspiration sign plus circumflex (see U+1F66/U+1F6E -- this is the Greek origin of Slavonic "omega with great apostrophe"). The oldest Slavonic books (like Meletiy Smotricky's grammar of 1619) use glyph very similar to U+1F66/U+1F6E in Slavonic text.

9. But the shape of both diacritical signs and omega itself have been significantly changed during centuries, so the "great apostrophe" may not more be considered as a pair of objects. Moreover, it is used only with the letter omega, and omega itself under the great apostrophe is traditionally printed in very special way (significantly lower and much wider than usually); as a result, among the people who currently know and use the Church Slavonic probably 99.9% do not know correct name of the diacritical mark and refer to the whole combination as "krasivaya omega" (a "nice" or "beauty" omega).


1. Existing glyphs U+047C/U+047D are "omega with titlo". The name correspond to the glyphs, but the glyphs are not in use.

2. Correct name of "new" version of glyphs U+047C/U+047D may be "omega s velikim apostrofom" (it is both Russian or Church Slavonic name -- they are the same), "omega with velikiy apostrof" (using Russian / Church Slavonic name for the diacritical mark only), or "omega with great apostrophe".

3. Three possible ways of the correct solution:

3.1. change both glyphs U+047C/U+047D and names;

3.2. add omega with great apostrophe as separate glyphs (both lower- and uppercase variants required) and leave existing U+047C/U+047D as is;

3.3. add omega with great apostrophe as separate glyphs (both lower- and uppercase variants required) and declare U+047C/U+047D as obsolete.

Best wishes!

Date/Time: Mon Jan 9 05:25:05 CST 2006
Contact: ralph.cleminson@port.ac.uk
Name: Professor R.M. Cleminson
Subject: Cyrillic Omega with Titlo, Re: Public Review Issue No.83: Changing Glyph for U+047C/U+047D Cyrillic Omega with Titlo

While the proposed new glyph is in some respects an improvement, it is still not really a correct representation of this character. The salient point is that this is a broad omega: the body of the character is wider and shallower than cyrillic omega, and this character is NOT the equivalent of U+0460/U+0461 plus diacritic. I can provide you with examples from printed texts from the 16th to 20th centuries, if you give me an address to send them to.

84 Proposed Update to UAX #29: Text Boundaries

No feedback was received via the reporting form this period.

85 Proposed Update to UAX #31: Identifier and Pattern Syntax

No feedback was received via the reporting form this period.

86 Proposed Update to UAX #15: Unicode Normalization Forms

Date/Time: Sat Jan 14 16:20:01 CST 2006
Contact: kent.karlsson14@comhem.se
Name: Kent Karlsson
Report Type: Public Review Issue
Subject: pri 86, UAX 15

"For round-trip compatibility with existing standards, Unicode has encoded many entities that are really variants of existing nominal characters. The visual representations of these characters are typically a subset of the possible visual representations of the nominal character. These are given compatibility decompositions in the standard."

I don't agree with these statements. There are a few characters (not used for math) that have compatibility decompositions, but yet aren't encoded "for compatibility". Among those characters are long s, ij ligature, no-break space, and a number of modifier letters.


The paragraph continues with another falsehood: "However, some characters with compatibility decompositions are used in mathematical notation to represent distinction of a semantic nature; replacing the use of distinct character codes by formatting may cause problems."

With the proper markup, there is no problem at all in maintaining the semantic distinction.


Further, some "compatibility characters" are given canonical decompositions (Angstrom sign and Ohm sign, and indeed even all canonically decomposable characters), and some do not have any decomposition (except to itself) at all (like JIS symbol).


"spacing accents (without a dotted circle) may be used to represent nonspacing accents"

I'd suggest rephrasing that (as that formulation is misleading):

"in this document, for printability, spacing accents may be used to represent nonspacing accents"


"Hangul decomposition could also be expressed this way. All LVT syllables decompose into an LV syllable plus an T jamo. The LV syllables themselves decompose into an L jamo plus a T jamo. Thus the Hangul canonical decompositions are fundamentally the same as the other canonical decompositions in terms of the way they decompose. This analysis can also be used to produce more compact code than what is given below."

Since the UTC accepted my suggestion to formally express the Hangul decompositions that way, the quoted paragraph should not say "could" but "is".

Annexes A10.2 and A10.3 on Hangul should be rewritten to take into account the new arithmetic decompositions that the UTC accepted. The resulting normal forms are the same, but the steps are different, and now (mathematically) follow the general approach, and doing it as a special case is only an optimisation (in data storage, not really in computation).

87 Proposed Update to UAX #24: Script Names

Date/Time: Fri Jan 20 21:17:54 CST 2006
Contact: verdy_p@wanadoo.fr
Name: Philippe Verdy

I see nearly no impact in this change (except for some regular expressions matching non-standard characters and treating them along Common characters).

But anyway, any string containing non-standard character is handled unpredictably one these characters are assigned and given a script property other than Common.

So new strings that would go across algorithms based on old versions of the UCD are already affected, and this difference won't change after the proposed update when currently unassigned characters will be assigned later and moved again from the "Unknown" script to some other script.

However I see a significant change, if a process currently expects that any characters matching the regular expression "[^[:Common:]]" are assigned and have stable normalization and stable normative properties. With the change, it will be necessary to exclude also [:Unknown:] from the character range above.

Note that this is not strictly related to regular expressions. Programs may as well be written that perform individual tests on normative properties using some Character API (for example the java.lang.Character class in Java) to determine the behavior of the process, just to first determine if the string only contains codepoints assigned to standard characters (with stable normative properties and thus stable normalizations). It may then make additional tests only to subclass only the "Common" characters, assuming that all non-Common characters are standardized in the implemented Unicode version and anyof its successors.

Given that this UAX defines a normative property of characters, such assumption seems valid according to the standard,so existing processes may fail with revised versions of the UCD, unless the application supplies an explicit versioning request to the API (which may be implemented separately as part of an OS service or library or language). For this reason, I suggest that such UAX #24 change, if accepted, be applied only with a new version number (probably better with a major release 5.0) of the UCD or of the standard, but not before as an independent update.

If Unicode 5.0 is ever released before this change is accepted, it may require months of postponing before the change to be effectively implemented and applicable (Other similar proposed changes to normative properties or annexes should follow the same versioning policy).

Isn't there another existing standard character property or regular expression that matches unassigned characters without using the new Script property value "Unknown", so that regular expressions still continue to exclude unassigned characters independently of the version of the UCD ?

Date/Time: Sun Jan 22 21:38:32 CST 2006
Contact: markus.icu@gmail.com
Name: Markus Scherer
Subject: 87 PU-UAX #24: Script Names - Unknown

tr24-8.html says: Note to Reviewers: there is a proposal to add a new special value for script name Unknown which is to be used as the default value for unassigned characters, instead of having Common serve that function. The proposal has not yet been approved by the UTC, but is reflected in this draft so that it can be reviewed more widely. It is not yet reflected in Scripts.txt in the UCD.

and Scripts.txt so far contains:

# All code points not explicitly listed for Script
# have the value Common (Zyyy).

This means that parsers of Scripts.txt have so far automatically assigned Common (Zyyy) to all code points not listed in the file, including all unassigned ones.

If UTC approves the PRI #87 proposal for adding Unknown (Zzzz), then I request for parser stability and for avoiding usage errors in upgrading implementations that Scripts.txt continue to omit at most such code points with the Common (Zyyy) script value - but not those with the new Unknown (Zzzz) value. In other words, Scripts-5.0.0.txt would explicitly list all code points with the new Unknown (Zzzz) value. The file would be slightly larger than if Unknown values were omitted as well, but it would be more safely parsable. Further, documentation (for example the quoted comment in Scripts.txt) would not need to be changed, and special considerations for upgraders would not need to be called out. (It might also result in fewer changes in the Scripts.txt generator program.)

For simplicity of generation and parsing, *all* code points could even be listed regardless of whether they have special values assigned.


88 Proposed Update to UAX #14: Line Breaking Properties

Date/Time: Fri Jan 27 18:24:08 CST 2006
Contact: heninger@us.ibm.com
Name: Andy Heninger

Here are my comments from reviewing public review draft of UAX #14: Line Breaking Properties.

Overall, I think this draft is pretty clean.

From Section 2

I think that all discussion of the pair table implementation should be removed from the general description of Line Breaking. Having the pair table description & code available as a sample implementation is a good thing, but it is not a fundamental or defining part of the specification, which should be (and is) complete without it.

From section 4.3, conformance & Tailoring

The tailoring restrictions appear to make it impossible for a tailoring to add new hard break characters, because rules LB7, 11 & 12 can not be overridden and the fundamental break and space classes can not be changed

I think it might be better to say that no character that is pre-assigned to a non-tailorable class can be reassigned to a different class, but that it is allowed for a tailoring to reassign any character from a tailorable class into one of the non-tailorable classes. In other words, tailorings are forbidden to screw with the behavior of predefined breaks, spaces, etc. but they are free to add additional ones. Especially from the XX (Unknown) class.

Section 4.4, a typo

> Because of the way the specification is set up, __HL3__ and HL3 have no effect on the results for text.

Should be HL2 and HL3

Section 6.1, LB 12, a typo

The line / rule
GL ×
appears twice.

-- Andy Heninger

Unicode 5.0.0 Beta Feedback

Date/Time: Thu Jan 5 01:10:45 CST 2006
Contact: chrislit@crosswire.org
Name: Chris Little
Report Type: Error Report
Subject: properties of U+10341 GOTHIC LETTER NINETY

U+10341 GOTHIC LETTER NINETY has incorrect properties (as of the 5.0 beta). It should have properties correlating to those of U+1034A GOTHIC LETTER NINE HUNDRED (following their change at the UTC meeting on 2005-02-09):

General_Category should be Nl (instead of current Lo) Numeric value should be 90

I am literate in Gothic and can attest that, within the Gothic corpus, U+10341 is never used phonemically, only numerically.

Date/Time: Thu Jan 5 19:34:35 CST 2006
Contact: typhlosion@gmail.com
Name: Benjamin Scarborough
Subject: Possible 5.0.0 Beta Error

The counting rod numerals located at 1D360..1D371, based on perceived meaning and usage, should have a Numeric_Type of numeric, and Numeric_Values in accordance with the character names.

In other words, the lines in question of UnicodeData-5.0.0d8.txt should read:

	1D369;COUNTING ROD TENS DIGIT ONE;No;0;L;;;;10;N;;;;;

Field 9 is actually currently blank for all 18 characters.

Editor's Note: Ken Whistler has already propagated this change into the 5.0.0 beta data files.

Date/Time: Sat Jan 14 18:16:42 CST 2006
Contact: typhlosion@gmail.com
Name: Benjamin Scarborough
Report Type: Error Report
Subject: Error in StandardizedVariants.txt

The comment for the sequence 2269 FE00 is "GREATER-THAN AND NOT DOUBLE EQUAL," but the character U+2269 is GREATER-THAN BUT NOT EQUAL TO.

This error has been in StandardizedVariants.txt since Unicode 4.0.

Editor's Note: This has already been fixed in the latest draft. 2006/1/18.

Date/Time: Wed Jan 18 14:06:53 CST 2006
Contact: future@shiny.co.il
Name: Ilya Konstantinov
Report Type: Public Review Issue
Subject: Feedback for Unicode 5.0.0: HEBREW PUNCTUATION MAQAF is a Dash-character


I wish to propose two changes to the property of U+05BE HEBREW PUNCTUATION MAQAF:

Change #1:

General Category: gc=Po (current) --> gc=Pd
Relevant data file: UnicodeData.txt Change #2:

Binary property Dash: False (current) --> True
Relevant data file: PropList.txt

The Maqaf character, in Hebrew, serves the same purpose a Hyphen does in English: connecting words together (as opposed to separating as performed by the Dash). It also looks like a horizontal line, albiet placed in the letter's head rather than middle.

This change will both:

1. Add to the correctness, and

2. Allow the folding rule which turns all gc=Pd characters into HYPHEN-MINUS to function on the HEBREW PUNCTUATION MAQAF character. This is called for, since in modern computerized Hebrew texts, the HYPHEN-MINUS is often used instead of HEBREW PUNCTUATION MAQAF, being easier to enter from a standard keyboard.