Bidirectional Category Proposal #2

Authors: John I. McConnell JohnMcCo@microsoft.com,
F. Avery Bishop AveryB@microsoft.com, David Brown DBrown@microsoft.com
Majd Abbar A-MajdA@microsoft.com, Ronen Yacobi A-RonenY@microsoft.com

18-Nov-1997

Proposal

This memo describes a proposal to change the bidirectional category of the SOLIDUS character in the Unicode 2.0 character database. Specifically, it would change the category from European Separator to Common Separator. The effect of this change is to alter the visual order of text containing SOLIDUS and text from right-to-left writing systems such as Arabic and Hebrew. The overall intent of the proposal is to better match such behavior with user expectations and existing practice.

If the Consortium accepts the proposal, it would also require changing the entries in Table 3-5 on page 3-17 and Table 4-4 on page 4-11 of the Unicode Standard. Note that there are no changes required to the Unicode bidirectional algorithm itself.

Rationale

With the introduction of the first Unicode-based software in the Middle East, users now have some experience with conversion of existing data to Unicode. Although the transition has been smooth, there have been some difficulties with fractions.

Test Cases

This section shows the effect of the proposed changes on two important cases: fractions and dates. In each test case we follow the same conventions as the Unicode 2.0 book, that is, uppercase letters correspond to strong right-to-left characters whereas lowercase letters correspond to strong left-to-right characters. In addition, we have also included examples using Arabic and Hebrew text. In all the examples except as noted, the embedding level is right-to-left. Results that differ from the current values in Unicode 2.0 are shaded.

The proposed change effects only the resolution of weak neutrals in steps P0 through P5 of the Unicode Bidirectional Algorithm. This limits the changes of behavior to cases where SOLIDUS is adjacent to numbers.

Fractions

Table 2 Fractions

Logical Order

Current Visual Order

Proposed Visual Order

ADD 1/2 CUP (Arabic)

PUC 2/1 DDA

PUC 1/2 DDA

ADD 1/2 CUP (Hebrew)

PUC 1/2 DDA

PUC 1/2 DDA

Dates

There are many date formats but the proposed changes would affect one frequently used form.

Table 3 Dates

Logical Order

Current Visual Order

Proposed Visual Order

MEET ON 01/23/45 (Hebrew)

01/23/45 NO TEEM

01/23/45 NO TEEM

MEET ON 01/23/96 (Arabic)

96/23/01 NO TEEM

01/23/96 NO TEEM

Conclusion

Without explicit formatting, it is impossible for both dates and fractions to display properly. Although the date change is undesirable, our users would prefer to have fractions correct rather than dates. There seem to be several reasons for this preference:

Although there are some tradeoffs, the authors believe that this proposal would more closely match user expectations for visual order of right-to-left text and expedite the development of software for regions that use such text. This improvement would promote the acceptance of Unicode for an important emerging software market.

Proposed Correction to Mirroring List

Both Unicode 2.0 and ISO 10646 define a normative list of mirrored characters. We believe that four characters have been omitted from these lists. Specifically, the four characters in Table 1 should be added to the lists of characters with the mirroring property.

Code Point

Glyph

Unicode 2.0 Name

0x00AB

«

LEFT-POINTING DOUBLE ANGLE QUOTATION MARK

0x00BB

»

RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

0x2039

SINGLE LEFT-POINTING ANGLE QUOTATION MARK

0x203A

SINGLE RIGHT-POINTING ANGLE QUOTATION MARK

Although the use of these characters varies, the mirroring behavior is unambiguous. For example, those printing traditions that use the left-pointing quotation mark to begin a left-to-right quotation use the right-pointing quotation mark to begin a right-to-left quotation and vice versa.

This correction would also reconcile the mirroring behavior of these characters with their cross-referenced characters such as 0x226A MUCH LESS THAN and 0x300A LEFT DOUBLE ANGLE-BRACKET. All of these related characters are listed as mirroring.

The effect of the correction would be to add these four characters to table 4-7 in the Unicode 2.0 book.