Comments on Public Review Issues

L2/15-019

Comments on Public Review Issues
(October 24, 2014 - January 29, 2015)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of January 29, 2015, since the previous cumulative document was issued prior to UTC #141 (November 2014). Grayed-out items in the Table of Contents do not have feedback here.

Issue Name Feedback Link

290 Proposed Update UAX #29, Unicode Text Segmentation (feedback) NEW

289 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback) NEW

288 Proposed Update UAX #41, Common References for Unicode Standard Annexes (feedback)

287 Proposed Update UAX #24, Unicode Script Property (feedback)

286 Proposed Draft UTR #51, Unicode Emoji (feedback) NEW

285 Proposed Update UTS #10, Unicode Collation Algorithm (feedback) NEW

284 Proposed Update UAX #44, Unicode Character Database (feedback)

283 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback)

282 Proposed Update UAX #31, Unicode Identifier and Pattern Syntax (feedback) NEW

280 Proposed Update UTR #23, The Unicode Character Property Model (feedback) NEW

279 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback)

The links below go to locations in this document for feedback.

Feedback on Encoding Proposals NEW
Feedback on UTRs / UAXes
Error Reports NEW
Other Reports NEW

Feedback on Encoding Proposals

Date/Time: Mon Nov 3 14:24:40 CST 2014
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: ZANABAZAR SQUARE LETTER -A considered annoying

Granted, it's a Good Thing that LETTER -A in this script matches up with 
LETTER -A in Tibetan.  But it's a Bad Thing that you're introducing yet 
another hard-coded exception to the rules about which character names collide 
with which other names.

Date/Time: Wed Nov 19 15:28:39 CST 2014
Name: Tim Larson
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/14-235R2 menorah

First, thank you for adding the note "Hanukiah" to U+1F54E.

Second, consider adding MENORAH WITH SEVEN BRANCHES at U+1F54F. 
The temple menorah is a symbol for Judaism almost as widely recognized 
as the Davidic star. The hanukiah only represents one, relatively minor, holiday within Judaism.

Feedback on UTRs / UAXes

(No feedback at this time in this section.)

Error Reports

Date/Time: Thu Nov 13 16:55:23 CST 2014
Name: Richard Ishida
Report Type: Error Report
Opt Subject: Incorrect glyph for MONGOLIAN LETTER YA medial second form

http://www.unicode.org/Public/UNIDATA/StandardizedVariants.html (Standardized 
Variants) shows a glyph for MONGOLIAN LETTER YA in the second medial form with 
an upturn to the left. I believe this image should show a straight downward line.

Reasons:

1. the first initial form has an upturn, and the second initial form is straight

2. Professor  Quejingzhabu's chart at 
http://www.babelstone.co.uk/Mongolian/MGWBM/MGWBM_C034-C035.jpg 
shows the upturn for the first medial form and the straight line for the 
second medial form.

3. the Mongolian Baiti, Mongolian White, Mongolian Writing, and Noto Sans 
Mongolian fonts all produce the upturn for the standard medial form and the 
straight line when followed by FVS#1 (To test these fonts you can go to 
http://rishida.net/scripts/block/mongolian.html#char1836 and change the font 
by opening the blue control at the bottom right of the window. See the top 
table in that section.)

Date/Time: Mon Jan 12 22:20:39 CST 2015
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Soft_Dotted definition overreaches

Principle P9 in Section 3.6 of Core Spec (Version 7.0) overreaches.
Here is how it reads right now:

P9 [Guideline] When a nonspacing mark is applied to the letters
i and j or any other character with the Soft_Dotted property, the
inherent dot on the base character is suppressed in display.

The problem is that it talks about all nonspacing marks, even 
those that go below or around letters.

I suggest we change it to "nonspacing mark *above*", and better 
additionally clarify it as characters with ccc=230 (which in 
practice happens to be the case).

Date/Time: Wed Jan 14 02:02:37 CST 2015
Name: Véronique Dejeux
Report Type: Error Report
Opt Subject: Inverting source characters and target strings
in 7.0 confusables.txt may cause problems

Hello,

In Version 7.0.0 of the MA table in confusables.txt, certain mappings 
were inverted: source characters became target strings.

For example:

confusables.txt version 7.0.0 states:
00 F6 ;  0629 ;  MA      # ( ö → ‎ة‎ ) LATIN SMALL LETTER O WITH DIAERESIS → ARABIC LETTER TEH MARBUTA  

whereas confusables.txt version 6.3.0 states:
0629 ;    00F6 ;    MA    # ( ‎ة‎ → ö ) ARABIC LETTER TEH MARBUTA → LATIN SMALL LETTER O WITH DIAERESIS    #

Can you explain the reason for these changes?  I suppose that they fix 
the idempotency issue that existed in previous versions. They may cause 
other problems, however.

For instance, consider U+0629 ARABIC LETTER TEH MARBUTA. We can assume 
that this code point is confusable with U+00F6 LATIN SMALL LETTER O WITH DIAERESIS.

However, by applying the skeleton function defined in UTS #39, section 4 
Confusable Detection using confusables.txt version 7.0.0, we obtain:
skeleton(00F6 ) = 006F 0308, since 00F6 has a NFD decomposition mapping
skeleton(0629) = 0629, since 0629 has neither NFD decomposition mapping nor MA mapping

Thus, according to version 7.0.0, since skeleton(00F6) is not equal to 
skeleton(0629), we can also assume that U+0629 ARABIC LETTER TEH MARBUTA 
is NOT confusable with U+00F6 LATIN SMALL LETTER O WITH DIAERESIS.

(Note that there was no ambiguity in confusables.txt version 6.3.0)

Are these two characters confusable according to version 7.0.0? It would 
appear so since they are defined in the confusables.txt MA table. But when 
we apply the skeleton() function they are not confusable.
Thanks for clarifying this point.

Date/Time: Wed Jan 21 11:22:37 CST 2015
Name: Joshua Dong
Report Type: Error Report
Opt Subject: Unicode 7.0 Hangul Jamo U+11xx errors

Hi,
For the PDF located at http://www.unicode.org/charts/PDF/U1100.pdf,
there are a few small issues:
* The entry under U+113D has a spelling mistake, voicless -> voiceless
* The entry under U+115F has an erroneous character. While not noticeable,
  the character is actually U+E45F and not U+115F.
* Similarly, the entry under U+1160 has an erroneous character. While not 
  noticeable, the character is actually U+E460 and not U+1160.

Thank you,
Joshua Dong

Other Reports

Date/Time: Tue Dec 23 09:48:58 CST 2014
Contact: wjgo_10009@btinternet.com
Name: William Overington
Report Type: Other Question, Problem, or Feedback
Opt Subject: Unicode Encoding Policy

Unicode encoding policy

There is a document.

http://www.unicode.org/L2/L2014/14250.htm

Within the document, the following are interesting items.

E.1.7 Emoji Additions: popular requests [Edberg, Davis, L2/14-272]

Discussion. UTC took no action at this time.

Later, in the same document is the following.

E.1.7 Emoji Additions: popular requests [Edberg, Davis, L2/14-272R]

[141-C6] Consensus: Add the block U+1F900..U+1F9FF Supplemental Symbols and Pictographs for Unicode version 8.0.

The referenced document contains links to various requests and petitions for additional emoji characters.

In the referenced document, within section C, is the following.

5. Are the proposed characters in current use by the user community?
No

----

This appears to be a major change in encoding policy.

This, in my opinion, is a welcome, progressive change in policy that allows
new characters for use in a pure electronic technology to be added into
regular Unicode without a requirement to first establish widespread use by
using an encoding within a Unicode Private Use Area.

I feel that it is now therefore possible to seek encoding of symbols, perhaps
in abstract emoji format and semi-abstract emoji format, so as to implement a
system for communication through the language barrier by whole localizable
sentences, with that system designed by interested people without the need to
produce any legacy data that is encoded using an encoding within a Unicode
Private Use Area.

A first draft petition could be produced and then later drafts developed by
consensus and, when drafting has produced a document for an initial core
system then a petition could be submitted to the Unicode Technical Committee.

Once in use, the system could have additional symbols added to it, gradually,
so as to expand its capabilities as needs are identified.

Would the Unicode Technical Committee be willing to encourage this development
please?

William Overington

23 December 2014

Issue	Name	Feedback Link
290	Proposed Update UAX #29, Unicode Text Segmentation	(feedback) NEW
289	Proposed Update UAX #38, Unicode Han Database (Unihan)	(feedback) NEW
288	Proposed Update UAX #41, Common References for Unicode Standard Annexes	(feedback)
287	Proposed Update UAX #24, Unicode Script Property	(feedback)
286	Proposed Draft UTR #51, Unicode Emoji	(feedback) NEW
285	Proposed Update UTS #10, Unicode Collation Algorithm	(feedback) NEW
284	Proposed Update UAX #44, Unicode Character Database	(feedback)
283	Proposed Update UAX #14, Unicode Line Breaking Algorithm	(feedback)
282	Proposed Update UAX #31, Unicode Identifier and Pattern Syntax	(feedback) NEW
280	Proposed Update UTR #23, The Unicode Character Property Model	(feedback) NEW
279	Proposed Update UAX #9, Unicode Bidirectional Algorithm	(feedback)

L2/15-019