Comments on Public Review Issues

L2/18-231

Comments on Public Review Issues
(April 26 - July 23, 2018)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 18, 2018, since the previous cumulative document was issued prior to UTC #155 (May 2018). Some items in the Table of Contents do not have feedback here.

Feedback to UTC / Encoding Proposals

Date/Time: Tue Jun 5 16:31:58 CDT 2018
Name: Eduardo Marin Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Creation of the new block to accomodate new number forms


This block would accomodate both the segmented digit numerals proposed in:
http://www.unicode.org/L2/L2017/17435r-terminals-prop.pdf and the remaining
tally mark systems left to encode: https://www.unicode.org/L2/L2018/18088-tally-marks.pdf

Date/Time: Fri Jun 8 17:24:43 CDT 2018
Name: Eduardo Marin Silva
Report Type: Feedback on an Encoding Proposal
Opt Subject: Allocating all music related chinese symbols into a single block


I propose to allocate all chinese musical symbols into a single block, this
includes three rows to accomodate the lute notation:
https://www.unicode.org/L2/L2017/17311-n4848-lute.pdf, one row for the flute
notation: https://www.unicode.org/L2/L2017/17312-n4849-flute.pdf and one
more row for the Gongche symbols http://www.unicode.org/L2/L2017/17087-gonche-notation.pdf.
I propose the range 1D250-1D29F and name it Chinese
Musical Notation.

Date/Time: Wed Jul 18 11:13:33 CDT 2018
Name: Doug Ewell
Report Type: Feedback on an Encoding Proposal
Opt Subject: Feedback on various proposals (L2/18-185 L2/18-186 L2/18-187 L2/18-188 L2/18-215 L2/18-233)


L2/18-185 through 18-188 (Henri Sivonen):

These proposals recommend in various ways that Unicode definitions be
modified to match, or at least explicitly mention, the independently
developed WHATWG "Web Platform" standard, and in particular that the use of
UTF-16 and UTF-32 be discouraged to a greater or lesser extent in favor of
UTF-8, to conform with the WHATWG standard.

The Unicode Consortium should recognize that it has the preeminent role in
defining the Unicode Standard, and that the definitions and conformances
models of the Unicode Standard are not and should not be dependent on
standards promoted by other organizations (other than WG2).

It should also recognize that the Web platform, while vitally important, is
not the only platform in which Unicode text is used, and that considerations
that are appropriate for the Web may not be suitable for all uses of
Unicode.

--

L2/18-215 (Tom Adams and Jerri Wilson)

This proposal is typical of many emoji proposals over the past few years,
but I address this one specifically as one of the targeted users.

Neither the fact that millions of people suffer from asthma and use an
inhaler, nor the fact that medicines employing other intake forms have emoji
but inhaled medications do not, nor the fact that the occurrence of the
English word "inhaler" on Google Trends is comparable to the occurrence of
"syringe," constitutes a justification for encoding this as a Unicode
character.

Is it expected that asthmatics would text or tweet something to the effect
of "Darn, I forgot my [inhaler emoji] and now I'm coughing"? Would it be
used to identify oneself as an asthmatic, and if so, why? Is there wide
demand for this type of pictograph, or will it be created on the basis of a
single proposal?

--

L2/18-233

Gosh, if only the subdivision flag emoji mechanism hadn't been arbitrarily
confined to "RGI" sequences, requests like this wouldn't be necessary. This
type of sequence, and many other like it, would just work (assuming the
glyphs were available, which is a separate matter).

Date/Time: Mon Jul 23 01:12:32 CDT 2018
Name: Piotr Grochowski
Report Type: Feedback on an Encoding Proposal
Opt Subject: Reaction to L2/18-241

"We reviewed this document, which requests adding 1,120 superscripts,
subscripts, and  small capitals. In our opinion, adding 1,120 such
characters is not a good idea architecturally. A full  proposal with
orthographic evidence for the specific characters could, however, be
considered, if the  author provided such a document."

Original proposal: L2/18-206

This change is a huge change, but an important one. Currently superscripts
and subscripts from markup render poorly, as a transformation of the base
character. However this proposal will help font designers customize
superscripts, subscripts and small capitals, and help rendering engines
migrate from blindly transforming the base character to using the Unicode
equivalents if possible.

Feedback on UTRs / UAXes

Name: Tencent's Xuanwu Lab

Date: Wed, 6 Jun 2018 16:37:42 +0800

Subject: Unicode error report

Hello,

I found a mistake in "UTR 36 Unicode Security Considerations".

Error:
【http://www.unicode.org/reports/tr36/#Inadequate_Rendering_Support
Table 5. Inadequate Rendering Support
3b	ẹl.com	0065 0323 006C 002E 0063 006F 006D	xn--l-ewm.com】

'xn--l-ewm.com' is error  ---->  'xn--el-5wb.com' is ok

Best regards,
xisigr
Tencent's Xuanwu Lab (tencent.com)

Date/Time: Tue Jul 3 01:31:39 CDT 2018
Name: Wang Yifan
Report Type: Other Question, Problem, or Feedback
Opt Subject: Question on UTS#51 and emoji-sequences.txt


[Transferred from my mailing list post]

When I'm looking at
https://unicode.org/Public/emoji/11.0/emoji-sequences.txt

It goes on line 16 that:
----------
#   type_field: any of {Emoji_Combining_Sequence, Emoji_Flag_Sequence,
Emoji_Modifier_Sequence}
#     The type_field is a convenience for parsing the emoji sequence
files, and is not intended to be maintained as a property.
----------

This field, however, actually contains "Emoji_Keycap_Sequence" and
"Emoji_Tag_Sequence", instead of "Emoji_Combining_Sequence" (it was
already so in 5.0).

And I go back to
http://www.unicode.org/reports/tr51/

Under the section 1.4.6:
----------
ED-21. emoji keycap sequence set — The specific set of emoji sequences
listed in the emoji-sequences.txt file [emoji-data] under the category
Emoji_Keycap_Sequence.
ED-22. emoji modifier sequence set — The specific set of emoji
sequences listed in the emoji-sequences.txt file [emoji-data] under
the category Emoji_Modifier_Sequence.
ED-23. RGI emoji flag sequence set — The specific set of emoji
sequences listed in the emoji-sequences.txt file [emoji-data] under
the category Emoji_Flag_Sequence.
ED-24. RGI emoji tag sequence set — The specific set of emoji
sequences listed in the emoji-sequences.txt file [emoji-data] under
the category Emoji_Tag_Sequence.
----------

I'm not sure if the "category" means "type_field" or headings in the
txt file, as the headings do not contain underscores. If it means
"type_field", then the description of type_field above is wrong.

Also the section 1.4.5:
----------
ED-14c. emoji keycap sequence — A sequence of the following form:

emoji_keycap_sequence := [0-9#*] \x{FE0F 20E3}

- These characters are in the emoji-sequences.txt file listed under
the category Emoji_Keycap_Sequence
----------
While in the previous version (rev. 12):
----------
ED-14c. emoji keycap sequence — An emoji combining sequence of the
following form:

emoji_keycap_sequence := [0-9#*] \x{FE0F 20E3}

- These characters are in the emoji-sequences.txt file listed under
the category Emoji_Combining_Keycap_Sequence
----------

It seems there was some kind of confusion on terms, but anyway, isn't
the last line of ED-14c redundant with the current revision? (Or
"Emoji_Combining_Sequence" is intended?)

Error Reports

Date/Time: Sat Jun 9 20:10:04 CDT 2018
Name: Martin J. Dürst
Report Type: Error Report
Opt Subject: Missing explanation re. titlecase for Georgian

As explained at http://www.unicode.org/versions/Unicode11.0.0/#Migration,
Mkhedruli Georgian letters do not have titlecase mappings to Mtavruli
letters. Although the other (complicated!) aspects of Georgian casing are
explained in the chapter on Georgian
(http://www.unicode.org/versions/Unicode11.0.0/ch07.pdf, Section 7.7,
Georgian, pp. 320-321) in the standard, this peculiar aspect is not, but
should be. At the very least, a pointer to the respective information (if
any) somewhere else in the standard should be provided.

(sorry to label this as an Error, but I didn't find anything better)

Date/Time: Wed Jun 13 09:35:06 CDT 2018
Name: Emmanuel Froissart
Report Type: Error Report
Opt Subject: Wrong order of fields in PropertyAliases.txt


In PropertyAliases.txt, the first field (preferred abbreviated name) and second field 
(preferred long name) of aliases related to CJK are reversed.

The following lines:

cjkAccountingNumeric     ; kAccountingNumeric
cjkOtherNumeric          ; kOtherNumeric
cjkPrimaryNumeric        ; kPrimaryNumeric
cjkCompatibilityVariant  ; kCompatibilityVariant
cjkIICore                ; kIICore
cjkIRG_GSource           ; kIRG_GSource
cjkIRG_HSource           ; kIRG_HSource
cjkIRG_JSource           ; kIRG_JSource
cjkIRG_KPSource          ; kIRG_KPSource
cjkIRG_KSource           ; kIRG_KSource
cjkIRG_MSource           ; kIRG_MSource
cjkIRG_TSource           ; kIRG_TSource
cjkIRG_USource           ; kIRG_USource
cjkIRG_VSource           ; kIRG_VSource
cjkRSUnicode             ; kRSUnicode                  ; Unicode_Radical_Stroke; URS

should be corrected to:

kAccountingNumeric       ; cjkAccountingNumeric
kOtherNumeric            ; cjkOtherNumeric
kPrimaryNumeric          ; cjkPrimaryNumeric
kCompatibilityVariant    ; cjkCompatibilityVariant
kIICore                  ; cjkIICore
kIRG_GSource             ; cjkIRG_GSource
kIRG_HSource             ; cjkIRG_HSource
kIRG_JSource             ; cjkIRG_JSource
kIRG_KPSource            ; cjkIRG_KPSource
kIRG_KSource             ; cjkIRG_KSource
kIRG_MSource             ; cjkIRG_MSource
kIRG_TSource             ; cjkIRG_TSource
kIRG_USource             ; cjkIRG_USource
kIRG_VSource             ; cjkIRG_VSource
kRSUnicode               ; cjkRSUnicode                ; Unicode_Radical_Stroke; URS

In PropertyValueAliases.txt, the following commented lines:

# cjkAccountingNumeric (cjkAccountingNumeric)
# cjkCompatibilityVariant (cjkCompatibilityVariant)
# cjkIICore (cjkIICore)
# cjkIRG_GSource (cjkIRG_GSource)
# cjkIRG_HSource (cjkIRG_HSource)
# cjkIRG_JSource (cjkIRG_JSource)
# cjkIRG_KPSource (cjkIRG_KPSource)
# cjkIRG_KSource (cjkIRG_KSource)
# cjkIRG_MSource (cjkIRG_MSource)
# cjkIRG_TSource (cjkIRG_TSource)
# cjkIRG_USource (cjkIRG_USource)
# cjkIRG_VSource (cjkIRG_VSource)
# cjkOtherNumeric (cjkOtherNumeric)
# cjkPrimaryNumeric (cjkPrimaryNumeric)
# cjkRSUnicode (cjkRSUnicode)

should be corrected to:

# cjkAccountingNumeric (kAccountingNumeric)
# cjkCompatibilityVariant (kCompatibilityVariant)
# cjkIICore (kIICore)
# cjkIRG_GSource (kIRG_GSource)
# cjkIRG_HSource (kIRG_HSource)
# cjkIRG_JSource (kIRG_JSource)
# cjkIRG_KPSource (kIRG_KPSource)
# cjkIRG_KSource (kIRG_KSource)
# cjkIRG_MSource (kIRG_MSource)
# cjkIRG_TSource (kIRG_TSource)
# cjkIRG_USource (kIRG_USource)
# cjkIRG_VSource (kIRG_VSource)
# cjkOtherNumeric (kOtherNumeric)
# cjkPrimaryNumeric (kPrimaryNumeric)
# cjkRSUnicode (kRSUnicode)

Date/Time: Sat Jun 23 11:18:05 CDT 2018
Name: David Corbett
Report Type: Error Report
Opt Subject: Ambiguity when using underscores in character names

Chapter 4, page 180 says “a common strategy is to replace any hyphen-minus
or space in a character name by a single “_” when constructing a formal
identifier from a character name. [...] such identifiers are guaranteed to
be unique, because of the special rules for character name matching.” It’s
guaranteed in the current version, but is not guaranteed to be future-
compatible: “X- -Y” and “X - Y” are valid non-matching names which both
become “X___Y”.

Date/Time: Mon Jun 25 08:58:44 CDT 2018
Name: David Corbett
Report Type: Error Report
Opt Subject: Wrong IPA symbol for voiceless velar fricative

In the Brahmi and Sharada sections of The Unicode Standard, the jihvamuliya
is described as the velar allophone of /h/ and is transcribed [χ], but it
should be [x]: [χ] is uvular, not velar.

Subject: Typo in TangutSources.txt
Date: Tue, 3 Jul 2018 21:34:27 -0400
From: David Corbett

TangutSources.txt lists Viacheslave Zaytsev as a co-author of
UTN #42, but the technical note itself spells it “Viacheslav Zaytsev”.

(Note: this has been corrected in the Unicode 12.0 draft file for Tangut sources.)

Date/Time: Mon Jul 23 08:00:45 CDT 2018
Name: David Corbett
Report Type: Error Report
Opt Subject: Georgian sentence breaking

Because the Mkhedruli letters were changed in Unicode 11.0 from Lo to Ll, 
there is no longer a sentence break in “ა. ბ”, but there should be, because 
Georgian sentences still begin with Mkhedruli letters. The newly encoded 
Mtavruli letters are not used for sentence casing.

Other Reports

(None this period.)

Issue	Name	Feedback Link
378	Draft UTR #53, Unicode Arabic Mark Rendering	(feedback) No feedback at this time

L2/18-231