Comments on Public Review Issues

L2/15-189

Comments on Public Review Issues
(April 30 - July 21, 2015)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of April 30, 2015, since the previous cumulative document was issued prior to UTC #142 (February 2015). Grayed-out items in the Table of Contents do not have feedback here.

Issue Name Feedback Link

302 Feedback on Draft additional repertoire for ISO/IEC 10646:2016 (5th edition) (feedback)

301 Feedback on Additional repertoire for Amendment 2 (DAM2) to ISO/IEC 10646:2014 (4th edition) (feedback)

300 Proposed Update UTR #51, Unicode Emoji (feedback)

299 Representing Additional Types of Flags (feedback)

The links below go to locations in this document for feedback.

Feedback on Encoding Proposals
Error Reports
Other Reports

Feedback on Encoding Proposals

Date/Time: Fri Jul 24 17:40:06 CDT 2015
Name: Markus Scherer
Report Type: Other Question, Problem, or Feedback
Opt Subject: SignWriting collation, Ken's L2/15-202

Regarding http://www.unicode.org/L2/L2015/15202-signwriting-ducet-aux.txt

I would like to note that Ken's analysis suggests that the fills and rotations
should work properly if assigned "trailing primary weights".
http://www.unicode.org/reports/tr10/#DUCET_Order_Table  (Table 13. DUCET
Ordering) http://www.unicode.org/reports/tr10/#Trailing_Weights  (7.1.4
Trailing Weights)

I don't know whether it is feasible to assign these characters such trailing
weights in the DUCET. We do use the "trailing" primary weight FFFD for U+FFFD.

Trailing weights can be tailored with CLDR/ICU syntax.
http://www.unicode.org/reports/tr35/tr35-collation.html#Logical_Reset_Positions
(3.11 Logical Reset Positions)

Date/Time: Mon Jul 27 19:00:39 CDT 2015
Name: Garth Wallace
Report Type: Feedback on an Encoding Proposal
Opt Subject: Hentaigana and the Kana Supplement block

The recent hentaigana proposal (L2/15-193) requests that they be encoded as
Standardized Variation Sequences of hiragana. This seems like a good
idea, since fallback in the absence of font support would be to the
standard hiragana, so the results would still be readable. But where
does that leave the Kana Supplement block? That block contains only
two encoded characters, but was allocated 256 code points, presumably
for the future encoding of hentaigana. With hentaigana handled by
SVSes, it seems unlikely that many of those points would ever get
filled. I realize there's no shortage of code points in the UCS, but
still.

One thing I noticed: the hentaigana proposal contains a duplicate of
an existing character. MJ090014 (え variant with mother ideograph 江)
looks like it's already encoded in the Kana Supplement block as
U+1B001 HIRAGANA LETTER ARCHAIC YE.

Feedback on UTRs / UAXes

(No feedback at this time in this section.)

Error Reports

Date/Time: Thu May 14 17:41:18 CDT 2015
Name: Ken Lunde
Report Type: Error Report
Opt Subject: kKorean versus kHangul (UAX #38)

1) The kHangul field is currently covering the characters that 
correspond to the KS X 1001 (4,888) and KS X 1002 (2,856) standards, 
and only nine entries need to be adjusted, as follows, to make this 
alignment correct an up-to-date:

Changes:
U+6635	kHangul	닐
U+66B1	kHangul	닐
U+8D05	kHangul	췌
U+96B8	kHangul	례

Additional field value:
U+90DE	kHangul	낭 랑
U+96B7	kHangul	례 예

Removals:
U+90CE	kHangul	낭

Additions:
U+FA2E	kHangul	낭
U+FA2F	kHangul	예

2) The kHangul field currently specifies that one or more instances of 
U+1100 through U+11FF be used for each value. In reality, it should be 
two or three instances. I suggest the following regex:

[\x{1100}-\x{11FF}]{2}[\x{1100}-\x{11FF}]?

But, because these sequences normalize (via NFC) to characters in the 
range U+AC00 through U+D7A3, I recommend that they be changed accordingly, 
which will result in greater stability and greater compaction (one 
character instead of two or three). In addition to changing the data 
itself from two or three instances of U+1100 through U+11FF to one 
instance of U+AC00 through U+D7A3, the regex in UAX #38 needs to be 
changed to the following:

[\x{AC00}-\x{D7A3}]

3) I recommend that the status of the kKorean field be changed from 
Provisional to Deprecated, and that the use of kHangul be recommend 
for Korean readings.

Date/Time: Thu Jun 11 00:18:56 CDT 2015
Name: Sebastian Mayr
Report Type: Error Report
Opt Subject: Conformance Section in UTS46 is confusing

NOTE: Sent to Mark Davis and Editorial Committee already, and acknowledged receipt to user.

The Format section (8.1) under Conformance Testing in UTS46 is confusing.

The explanation for the toASCII and toUnicode explains to use the provided
processing_option for toUnicode, and always use nontransitional for toASCII.
However, in the implementation section of toUnicode (4.3), it explains to
always call the processing step with nontransitional. The toASCII parameter
list provides a processing_option, though.

It looks to me, as if the descriptions for toASCII and toUnicode in the
conformance testing section got mixed up. This also applies to the
descriptions in the header of IdnaTest.txt.

Date/Time: Sun Jul 12 11:29:03 CDT 2015
Name: Laurentiu Iancu
Report Type: Error Report
Opt Subject: Missing copyright / terms of use statements in the security and UCA data files

Unlike the UCD files, several UTS #39 and UTS #10 data files (in
Public/security/latest/ and Public/UCA/latest/) are missing copyright and
terms of use statements.  The affected files are the following:

Public/security/latest/
	All of the files in that directory
Public/UCA/latest/
	CollationTest.html
	All of the files inside CollationTest.zip

This issue was raised and discussed briefly during the release of Unicode 8.0
(BRS item #111).  The conclusion there was that it should be a priority to fix
for the next releases.

Date/Time: Sun Jul 12 11:31:17 CDT 2015
Name: Laurentiu Iancu
Report Type: Error Report
Opt Subject: Missing # EOF lines in idna, security, and UCA data files

All of the UCD files end with a # EOF line.  Several UTS #46, #39, and #10
data files (in Public/idna/latest/, Public/security/latest/, and
Public/UCA/latest/) do not have such lines.  Specifically, the files with
missing # EOF lines are the following:

Public/idna/latest/
	IdnaMappingTable.txt
	IdnaTest.txt
Public/security/latest/
	All of the files in that directory (ReadMe.txt is N/A)
Public/UCA/latest/
	allkeys.txt
	decomps.txt
	All of the files inside CollationTest.zip

This issue was also discussed briefly during the release of Unicode 8.0 (in
relation to BRS item #111). However, compared to the issue of missing
copyright and terms of use statements, reported separately, the absence of #
EOF lines does not seem to constitute a priority. It ought to be examined by
the UTC, though, to decide whether updating all of the tools that generate the
data (including test) files listed above is a worthy investment, to make them
consistent with the UCD files in terms of # EOF lines.

Date/Time: Tue Jul 28 16:52:40 CDT 2015
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Error in Full Emoji Data chart for Android glyphs

The Full Emoji Data chart at
http://www.unicode.org/emoji/charts/full-emoji-list.html
includes some flag images in the Android column that actually
do not exist on any version of Android software.

Here is the list of flags that should be removed from that chart:

* flag for Antarctica
* flag for St. Barthélemy
* flag for Guadeloupe
* flag for Heard & McDonald Islands
* flag for St. Martin
* flag for Martinique
* flag for St. Pierre & Miquelon
* flag for Réunion
* flag for Svalbard & Jan Mayen
* flag for Wallis & Futuna
* flag for Mayotte
* flag for French Guiana
* flag for New Caledonia
* flag for Caribbean Netherlands
* flag for St. Helena
* flag for U.S. Outlying Islands
* flag for Western Sahara
* flag for Falkland Islands
* flag for South Georgia & South Sandwich Islands
* flag for French Southern Territories
* flag for Clipperton Island
* flag for Diego Garcia
* flag for Ceuta & Melilla
* flag for Canary Islands
* flag for Tristan da Cunha

Other Reports

Date/Time: Thu Jun 11 07:38:01 CDT 2015
Name: Ken Lunde
Report Type: Other Question, Problem, or Feedback
Opt Subject: Proposed alias or annotation for U+1F52B PISTOL

Note: This has already been sent to Emoji Subcommittee.

Because some people and organizations make a distinction between pistol and
handgun, with the former being of the semi-automatic variety, and the latter
being an umbrella term that covers pistols and revolvers, and because some
implementations of U+1F52B PISTOL use an image of a revolver, I propose that
the alias or annotation, 'handgun', be added to this character, and to
consider 'revolver' as a second alias or annotation.

Date/Time: Wed Jun 17 19:43:38 CDT 2015
Name: Richard Gillam
Report Type: Error Report
Opt Subject: Word-break handling of fullwidth digits

My application's word-counting code is based on the ICU word-break iterator
(UBRK_WORD), and it's getting wrong results with CJK text that includes
fullwidth digits.  Any numbers written with fullwidth digits aren't getting
counted as numbers by ICU-- instead, the individual digits get treated the
same as punctuation or whitespace.  I looked in
http://www.unicode.org/Public/UCD/latest/ucd/auxiliary/WordBreakProperty.txt,
and I notice that the fullwidth digits are not mentioned in this file at all--
shouldn't they be given the "Numeric" property, like all the other digits?
They do have the Nd general category, like all the other digits.  Unlike the
other digits, they have the ID line-break property, but this shouldn't matter.

Tell me what I'm missing here...

Date/Time: Tue Jun 30 05:13:23 CDT 2015
Name: Marcel Schneider
Report Type: Error Report
Opt Subject: Error in the Core specifications

IMHO the Core Specificationsʼ 23.2 Word Joiner first sentence is wrong in that
it generalizes from the absence of line break opportunities to the absence of
word boundaries. In practice, the word boundaries behavior of U+FEFF and
U+00A0 is the opposite, they indicate a word boundary. From this I extrapolate
to U+2060, which is not a part of any font shipped with Windows 7 and
therefore I canʼt test.

At the end of the next paragraph, Unicode recommends to ignore the word joiner
whenever the issue is not word breaking or line breaking. As far as belongs to
the ZWNBSP, this character is not ignored when word boundaries are determined.
E.g., when the letter apostrophe is bracketed with U+FEFFs, it behaves like a
punctuation apostrophe. This makes the word joiners even more useful.

Please let me know if Unicode can make sense and use of the above for the on-
going TUS overhaul without any discussion of this issue to be launched on the
Mailing List. If not, Iʼm ready to mail the topic, or to mention it in the
current one (WORD JOINER vs ZWNBSP). However, I donʼt want to reinforce my
probable reputation of someone who loves criticising other peopleʼs work.

Best regards,
Marcel Schneider

PS: Dimly I would suggest that one may wish to add a plural s on the front
page of the Core Specs.

Issue	Name	Feedback Link
302	Feedback on Draft additional repertoire for ISO/IEC 10646:2016 (5th edition)	(feedback)
301	Feedback on Additional repertoire for Amendment 2 (DAM2) to ISO/IEC 10646:2014 (4th edition)	(feedback)
300	Proposed Update UTR #51, Unicode Emoji	(feedback)
299	Representing Additional Types of Flags	(feedback)

L2/15-189