L2/11-427

Feedback on Unicode Version 6.1 Beta

Date/Time: Fri Oct 28 17:11:52 CDT 2011
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: IdnaMappingTable.txt minor formatting issue


In IdnaMappingTable.txt, inline comments are usually formatted with a 
space between the # and the Unicode age value.
In the 6.1 version, when a data line has long data values, the space is 
omitted. Slightly ugly, and gratuitous diffs.

For example,

2900..2A0B    ; valid                  ;      ; NV8    # 3.2  RIGHTWARDS TWO-HEADED ARROW WITH VERTICAL STROKE..SUMMATION WITH INTEGRAL
2A0C          ; mapped                 ; 222B 222B 222B 222B #3.2 QUADRUPLE INTEGRAL OPERATOR
2A0D..2A73    ; valid                  ;      ; NV8    # 3.2  FINITE PART INTEGRAL..EQUALS SIGN ABOVE TILDE OPERATOR
2A74          ; disallowed_STD3_mapped ; 003A 003A 003D #3.2  DOUBLE COLON EQUAL
2A75          ; disallowed_STD3_mapped ; 003D 003D     # 3.2  TWO CONSECUTIVE EQUALS SIGNS
2A76          ; disallowed_STD3_mapped ; 003D 003D 003D #3.2  THREE CONSECUTIVE EQUALS SIGNS
2A77..2ADB    ; valid                  ;      ; NV8    # 3.2  EQUALS SIGN WITH TWO DOTS ABOVE AND TWO DOTS BELOW..TRANSVERSAL INTERSECTION


Date/Time: Mon Oct 31 18:48:20 CDT 2011
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: Unicode 6.1 SpecialCasing.txt @missing needs another semicolon


Unicode 6.1 has this default-value line in the SpecialCasing.txt:

# @missing: 0000..10FFFF; <slc>; <stc>; <suc>

There needs to be another semicolon at the end according to the documentation in the header:

# The entries in this file are in the following machine-readable format:
#
# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment>

so the @missing line should be changed to

# @missing: 0000..10FFFF; <slc>; <stc>; <suc>;


Date/Time: Tue Nov 1 15:51:33 CDT 2011
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: UCA 6.1 bug in FractionalUCA.txt


UCA 6.1 has the same primary collation weight for all spaces.
In FractionalUCA.txt, that collation weight is a single byte 04.

That file also defines top-of-reordering-group primary weights, and the top-of-spaces is 04 FE:

FDD0 0042;	[04 FE, 05, 05]	# Special final value for reordering token

This is wrong.
The space weight of 04 is a prefix of the top-of-spaces weight, which is forbidden.
It also means that no character can be tailored primary-after any space and still 
reorder with the normal spaces.

For a fix, the range of lead bytes for spaces should be restored to 04..05, moving 
up every following primary weight, and the top-of-spaces weight needs to be restored
to 05 FE.

Date/Time: Tue Nov 1 15:57:35 CDT 2011
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: Unicode 6.1 bug in BidiTest.txt


Much of the data in BidiTest.txt has changed, and it appears that the new data is wrong.
Many of the results for auto-LTR changed levels.

For example, the 6.0 version of BidiTest.txt resolved ES and auto-LTR to level 0 but 
version 6.1 resolves it to level 1. See lines 93 & 108 of the file from 2011-07-25, 00:53:54 GMT.

I have a hunch that maybe the data generation tool actually uses auto-RTL on the bit 
that is documented as auto-LTR.

I recommend reverting this file to the 6.0 version unless there is a better fix.