This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Sun Sep 25 13:04:10 CDT 2011
Contact: corporate@khwilliamson.com
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: PropertyValueAliases.txt inconsistencies
PropetyValueAliases.txt is supposed to be constructed so that the first field is the property name; the 2nd the value's abbreviated name; the 3rd the long name; and any additional names are in trailing fields. Four entries in the file are anomalous, and I'm requesting that these be changed to the same as the other file entries. The entries in question are: dt ; Font ; font dt ; None ; none dt ; Sub ; sub dt ; Wide ; wide What each of these says is that the long name is the short name lower- cased. Thus the long name comes out as less formal than the short name. In no other cases are long names less formal than their short names. I believe that what was intended was something like other entries for the dt property: dt ; Can ; Canonical ; can I would like the 4 entries to be changed to: dt ; Font ; Font ; font dt ; None ; None ; none dt ; Sub ; Sub ; sub dt ; Wide ; Wide ; wide to correspond with the other entries in dt. Since Unicode property value matching rules call for case to be ignored, I don't understand, anyway, why there is an extra alias for these that is just the lower-cased short name. Perhaps this extra alias should be omitted.
Date/Time: Sun Sep 25 14:08:42 CDT 2011
Contact: corporate@khwilliamson.com
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: Inconsistent @missings in UCD .txt files
Some files use the abbreviated property value name for their data, and some use the long one. It would be nice if this were consistent, but I suppose that it's too late to change these. The most annoying is that ScriptExtensions.txt uses the short name, while Scripts.txt uses the long, and ScriptExtensions is not stand-alone, so code that reads them both has to convert. Since this is a provisional property, it may not be too late to change this. I ask for this to be considered. But my major request is that the @missings lines in a file use the same style (abbreviated or long) as the rest of the lines in the file for a given property. Unicode hasn't formally specified the format of the lines in the UCD .txt files that give the default value for code points not explicitly mentioned (at least last time I looked it hadn't), yet these are specified as to be machine-readable. So, I've assumed that the format is stable, and programmed reading them based on the existing paradigm, but I would think that it would be possible to change to the other style in these lines. For example, there is an annoying inconsistency in the DerivedNormalizationProps.txt file. The value for code points that are explicitly listed are based on the abbreviated property value alias, 'N' and 'M', but the missing defaults are listed as the long property value: # @missing: 0000..10FFFF; NFD_QC; Yes # @missing: 0000..10FFFF; NFC_QC; Yes # @missing: 0000..10FFFF; NFKD_QC; Yes # @missing: 0000..10FFFF; NFKC_QC; Yes I've had to program around this inconsistency, as has Asmus Freytag. The problem is that I'm writing code to expose the UCD db to Perl programs for the next Perl version. I would rather they not have to program around this inconsistency, as well. And it seems like the best place to fix it is at the source. I am requesting that Unicode change these lines to be: # @missing: 0000..10FFFF; NFD_QC; Y # @missing: 0000..10FFFF; NFC_QC; Y # @missing: 0000..10FFFF; NFKD_QC; Y # @missing: 0000..10FFFF; NFKC_QC; Y There are other files where this is true as well, namely HangulSyllableType-6.1.0d11.txt:# @missing: 0000..10FFFF; Not_Applicable extracted/DerivedBidiClass-6.1.0d12.txt:# @missing: 0000..10FFFF; Left_To_Right extracted/DerivedCombiningClass-6.1.0d12.txt:# @missing: 0000..10FFFF; Not_Reordered extracted/DerivedEastAsianWidth-6.1.0d12.txt:# @missing: 0000..10FFFF; Neutral extracted/DerivedJoiningType-6.1.0d12.txt:# @missing: 0000..10FFFF; Non_Joining extracted/DerivedLineBreak-6.1.0d12.txt:# @missing: 0000..10FFFF; Unknown It would be nice if these were made consistent with the rest of the data in their respective files
Date/Time: Tue Oct  4 00:24:51 CDT 2011
Contact: jamadagni@gmail.com
Name: Shriramana Sharma
Report Type: Public Review Issue
Opt Subject: Feedback on PRI 206 Unicode 6.1 beta
In the Tifinagh beta chart from http://www.unicode.org/Public/6.1.0/charts/blocks/U2D30.pdf the character 2D7F Tifinagh Consonant Joiner still has the annotation: "shape shown is arbitrary and is not visibly rendered". This is not entirely true. The recent document L2/11-112, based on which the glyph of this character 2D7F has been changed from a boxed TFNCJ to six dots in a dotted box, specifically says that it is desired to display the six dots when a proper biconsonant glyph is not available. It is hence recommended that the above annotation be changed to read: "* shape shown is arbitrary; * the six dots are recommended for fallback use if a biconsonant glyph is not available" or something like that.
Date/Time: Sat Oct  8 00:44:35 CDT 2011
Contact: petercon@microsoft.com
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: Beta review: Bidi category of 1F48C
In TUS6.0 and the beta, the bidi category of 1F48C is set to L. It should be ON, like all the other symbols in that block. Ken Whistler indicated that the case was a script used in drafting data files detecting "letter" in the character name, LOVE LETTER.
Date/Time: Sat Oct  8 00:46:10 CDT 2011
Contact: petercon@microsoft.com
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: Beta review: Bidi category of 1F48C
Addendum: provided by Ken Whistler: BTW, when you report that one, there is another with the exact same problem: U+1F524 INPUT SYMBOL FOR LATIN *LETTER*S which is also bc=L, instead of the expected bc=ON. Cf. U+1F520 INPUT SYMBOL FOR LATIN CAPITAL LETTERS which *did* get corrected, and is the expected bc=ON.
Date/Time: Tue Oct 25 17:56:09 CDT 2011
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: Unicode 6.1 DerivedBidiClass.txt bug
# DerivedBidiClass-6.1.0.txt # Date: 2011-09-16, 21:06:13 GMT [MD] has moved U+1EE00 - U+1EEFF from default-R to default-AL. The problem is that in the comments U+1EEFF is listed as both AL and R. Please change the comments so that U+1EF00 is the first remaining default-R code point. Change # [\u0590-\u05FF \u07C0-\u089F \uFB1D-\uFB4F \U00010800-\U00010FFF \U0001E800-\u0001EDFF \U0001EEFF-\U0001EFFF] to # [\u0590-\u05FF \u07C0-\u089F \uFB1D-\uFB4F \U00010800-\U00010FFF \U0001E800-\u0001EDFF \U0001EF00-\U0001EFFF] and change # U+1EEFF - U+1EFFF to # U+1EF00 - U+1EFFF The actual data looks correct.