Accumulated Feedback on PRI #428

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Fri Feb 12 09:39:14 CST 2021
Name: Jan Nijtmans
Report Type: Public Review Issue
Opt Subject: typo in UnicodeData-14.0.0d4.txt

Note: This has now been fixed in the Alpha data file.

In UnicodeData-14.0.0d4.txt, there's the following line (line 17988):

105B3;VITHKUQI SMALL LETTER SE;Ll;0;L;;;;;N;;;1058C;;1059C

But code point "1059C" is "VITHKUQI SMALL LETTER DE". I suspect
this is a typo, the "9" should have been an "8". Since
"VITHKUQI CAPITAL LETTER SE" make more sense as being the
titlecase variant of this character.

Thanks,
    Jan Nijtmans

Date/Time: Sat Feb 13 15:58:46 CST 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Recommendations on the alpha code charts

Note: This feedback has been taken into account in updated annotations for the NamesList.txt file.

    Combining Diacritical Marks supplement

  1. The "combing dot above left" should have a reference to the "Syriac feminine dot"

    Latin Extended-D

  1. The header above the old polish letters could read "Additional medieval letters" 
	rather than just "Additional letters"
  2. The "closed insular g" letters should have reciprocal cross reference to the 
	regular "insular g" letter as well as the "middle Scots s" letters should have 
	reciprocal cross references to the "sharp s" and "capital sharp s" and similarly 
	for the double thorn and double wynn with their regular counterparts.
  3. The header above the 2 modifier letters for Chatino, should have "(México)" appended 
	like the Mazahua letters.
  4. The header above the "modifier letter capital q" should read "Modifier letter for 
	phonemic transcription of Japanese", so the bullet note below can be removed and 
	replaced with a mutual cross reference to the "small capital q"

    Latin Extended-F

  1. Is there any reason why the "Modifier letter small capital aa" does not have a 
	<super> decomposition with the regular letter?

    Brahmi

  1. The entire new section should be in a single header saying "Old Tamil extensions" 
	and the note under that should be removed anyway.
  2. The position of the Old Tamil LLA, does not follow the usual order that the Indic 
	code-charts follow, because the consonant should come before the vowel signs, 
	but that isn't a big priority.
  3. The new Tamil Virama should have a bullet note stating that it is a "pure killer" 
	and maybe a similar note for the original Virama saying that it produces conjuncts.
  4. The "Anusvara sign" should be annotated to indicate that it shouldn't be used as 
	a replacement for the Tamil Virama (this is what was done in the code-chart for Tamil.

    Musical Symbols

  1. The header directly above the new accidentals should be dropped, and the note under the 
	first header should be changed to read "These two characters are used in Iranian 
	music notation to represent quarter notes."

Date/Time: Sun Feb 14 09:01:03 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Incorrect CCC of U+10F83

Note: This has now been corrected in the Alpha data file.

Proposed Character U+10F83 OLD UYGHUR COMBINING DOT BELOW currently has 
canonical combining class 230 (Above), but the correct value would be 220 (Below).

Date/Time: Mon Feb 15 19:56:28 CST 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Suggestions on the alpha code chart of Diacritical Marks Extended

1. Whenever a header says "Used in..." It should read instead "Marks for..."

2. The header above 1AC1 should say (after the current header) "... Do not use 
pairs of these marks as replacement for 1ABB or 1ABD"

3. The two marks "combining double plus above and below" should be moved up, 
to be next to the single "plus sign above" and the Ormulum marks shifted 
down two spots.

4. The bullet note above the "number sign above" currently reads "used 
extensively in J.P. Harrington’s transcriptional notation" I suggest 
for it to read "Used by J.P. Harrington to indicate heavy or contrastive stress"

5. The "combining triple acute accent" should have a mutual cross reference 
to the "combining double acute accent"

Date/Time: Sun Feb 14 08:59:40 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Incorrect decomposition mapping of U+107A9

Note: This has now been corrected in the Alpha data file.

Proposed character U+107A9 MODIFIER LETTER SMALL R WITH FISHHOOK currently
decomposes to U+207E SUPERSCRIPT RIGHT PARENTHESIS, but the correct mapping
would be to U+027E LATIN SMALL LETTER R WITH FISHHOOK.

Date/Time: Sun Feb 14 09:29:15 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: U+1CF42 and U+1CF43 have nonconformant names

The names of proposed characters U+1CF42 (ZNAMENNY PRIZNAK MODIFIER LEVEL 2)
and U+1CF43 (ZNAMENNY PRIZNAK MODIFIER LEVEL 3) currently do not conform to
section 4.8 of the Unicode Standard. A hyphen-minus needs to be inserted
before the final digit in both names because a digit must not immediately
follow a space.

Date/Time: Sun Feb 14 09:42:18 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: General category of U+1DF0A

Proposed character U+1DF0A LATIN LETTER RETROFLEX CLICK WITH RETROFLEX HOOK
currently has general category Ll (Lowercase_Letter). A more appropriate
value would be Lo (Other_Letter) which is shared by most other click
letters, including its hook‐less counterpart U+01C3 LATIN LETTER RETROFLEX
CLICK.

Date/Time: Sun Feb 14 09:58:07 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Names of U+1FAF1 and U+1FAF2

The names of proposed characters U+1FAF1 RIGHTWARD BACKHAND and U+1FAF2
LEFTWARD HAND could potentially be changed to RIGHTWARDS BACKHAND and
LEFTWARDS HAND respectively. The words “rightward” and “leftward” do not
occur in any other Unicode character names; instead the spellings
“rightwards” and “leftwards” are used every single time.

Date/Time: Sun Feb 14 10:01:09 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Defective glyph for U+1FAE2

The code chart glyph for proposed character U+1FAE2 FACE WITH OPEN EYES AND
HAND OVER MOUTH is inverted, showing a solidly filled face instead of an
outline drawing like the other faces.

Date/Time: Sun Feb 14 10:25:15 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Names of dezh and tesh digraphs with hooks

The names of the following proposed characters should be adjusted to include
the word “digraph” for consistency with their respective hook‐less
counterparts (U+02A4 LATIN SMALL LETTER DEZH DIGRAPH and U+02A7 LATIN SMALL
LETTER TESH DIGRAPH):

U+1DF12: LATIN SMALL LETTER DEZH WITH PALATAL HOOK → LATIN SMALL LETTER DEZH DIGRAPH WITH PALATAL HOOK
U+1DF17: LATIN SMALL LETTER TESH WITH PALATAL HOOK → LATIN SMALL LETTER TESH DIGRAPH WITH PALATAL HOOK
U+1DF19: LATIN SMALL LETTER DEZH WITH RETROFLEX HOOK → LATIN SMALL LETTER DEZH DIGRAPH WITH RETROFLEX HOOK
U+1DF1C: LATIN SMALL LETTER TESH WITH RETROFLEX HOOK → LATIN SMALL LETTER TESH DIGRAPH WITH RETROFLEX HOOK

Date/Time: Sun Feb 14 10:57:51 CST 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: General category of Znamenny priznak modifiers

The Znamenny priznak modifiers (U+1CF42..U+1CF46) were given the general
category Cf (Format). A more appropriate value would be Mn (Nonspacing_Mark)
because they apply directly to the preceding character, comparable to
variation selectors for instance. Other properties like bidi class and
grapheme cluster break would need to be adjusted accordingly as well.

Date/Time: Mon Feb 15 15:51:33 CST 2021
Name: Neil S Patel
Report Type: Public Review Issue
Opt Subject: Script Extensions for Arabic Punct used for N'ko and Adlam

Hello,

Recently, I have been working with a couple of W3C groups to look into
script itemization issues. We have noticed that with both Adlam and N'ko
when Arabic punctuation, typically used with both scripts, appears in a
string of text it triggers unexpected fall backs. This occurs even when the
tested font includes the appropriate Arabic punctuation. After some
discussion it was suggested that the script extensions could be responsible.


Reference: https://github.com/w3c/afrlreq/issues/18 

Currently the script extensions for Arabic punctuation is listed as follows. There are no references to African scripts.

# ================================================
# Script_Extensions=Arab Rohg Syrc Thaa Yezi

060C          ; Arab Rohg Syrc Thaa Yezi # Po       ARABIC COMMA
061B          ; Arab Rohg Syrc Thaa Yezi # Po       ARABIC SEMICOLON
061F          ; Arab Rohg Syrc Thaa Yezi # Po       ARABIC QUESTION MARK

# Total code points: 3
# ================================================



I would like to propose the following update to include Adlam and N'ko.

# ================================================
# Script_Extensions=Arab Nko Rohg Syrc Thaa Yezi

060C          ; Arab Nko Rohg Syrc Thaa Yezi # Po       ARABIC COMMA
061B          ; Arab Nko Rohg Syrc Thaa Yezi # Po       ARABIC SEMICOLON

# Total code points: 2
# ================================================

# ================================================
# Script_Extensions=Adlm Arab Nko Rohg Syrc Thaa Yezi

061F          ; Adlm Arab Nko Rohg Syrc Thaa Yezi # Po       ARABIC QUESTION MARK

# Total code points: 1
# ================================================

Thanks.

Date/Time: Tue Feb 23 21:42:01 CST 2021
Name: kirk miller
Report Type: Public Review Issue
Opt Subject: Character in Latin G is under wrong heading

Note: This feedback has been taken into account in updates for the NamesList.txt file.

In Latin Extended-G, the character:

  1DF07 𝼇 LATIN SMALL LETTER REVERSED ENG

is listed under the heading "IPA extensions".

It should appear under the preceding heading, "IPA letters for disordered
speech", as Michael Everson had it in his mapping. 

This can be accomplished by moving the heading "IPA extensions" down by one
character.

The error is easily verified with the chart for the extIPA alphabet for
disordered speech, published by the ICPLA. 

The IPA copy of the chart is available here:
https://www.internationalphoneticassociation.org/sites/default/files/extIPA_2016.pdf 

In that chart, the three letters REVERSED ENG, REVERSED K and REVERSED
SCRIPT G appear together as "velodorsal oral and nasal stops" in the
bottom-right table.

Date/Time: Fri Feb 26 15:42:43 CST 2021
Name: Vinodh Rajan
Report Type: Public Review Issue
Opt Subject: Sharada Code Chart

In the character list on Page 3, SHARADA VOWEL SIGN VOCALIC LL and SHARADA 
VOWEL SIGN E are overlapping. This needs to be fixed.

Date/Time: Fri Feb 26 15:56:05 CST 2021
Name: Vinodh Rajan
Report Type: Public Review Issue
Opt Subject: Telugu Nukta Glyph in the Code Chart

As per L2/20-085, Telugu Nukta should have the combining circle below as 
its representative glyph to avoid confusion with the aspirate marker. 

(If the current shape will be retained)
The annotation "can also appear as a large dot" is moot. The glyph is already a dot. 

V
 

Date/Time: Sat Feb 27 19:09:30 CST 2021
Name: Norbert Lindenberg
Report Type: Public Review Issue
Opt Subject: Annotations for Balinese surang and Sundanese panglayar

Note: This feedback has been taken into account in updates for the NamesList.txt file.

The annotations proposed in L2/20-150 were incorrectly transcribed into NamesList-14.0.0d7.txt:
– U+1B03 BALINESE SIGN SURANG should have the annotation “• also used for repha in transliteration of Kawi”.
– U+1B81 SUNDANESE SIGN PANGLAYAR should NOT have that annotation.

Cross references added in the names list appear to be intended to link the
two characters that are used for repha in transliteration of Kawi. To do so
correctly, the reference to U+A982 JAVANESE SIGN LAYAR needs to be moved
from 1B81 to 1B03, and the reference in A982 needs to refer to 1B03 rather
than to 1B81.

Date/Time: Sat Feb 27 22:19:11 CST 2021
Name: Norbert Lindenberg
Report Type: Public Review Issue
Opt Subject: UAX 44: Indic data for Toto

Note: This has been taken care of in the UAX #44 draft.

The proposed update for UAX 44, Unicode character database, has notes:

IndicPositionalCategory.txt
– Added values for characters in the newly encoded Toto script.
IndicSyllabicCategory.txt
– Appropriate Indic_Syllabic_Category property values were assigned 
to characters in the newly encoded Toto script.

The two data files this refers to are not available for review yet, 
but these notes assume that the Toto script has at least some of 
the characteristics of a Brahmic script that make Indic properties necessary.

According to the proposal L2/19-330, that is not the case: It states 
that "This Toto writing system is not syllable-based and doesn't have 
an inherent vowel." In addition, the combining class 230 for the 
U+1E2AE TOTO LETTER RISING TONE would be inappropriate if the script 
were Brahmic, as combining classes ≠ 0 are in general incompatible 
with the phonetic character order used for Brahmic scripts.

Date/Time: Mon Mar 1 15:54:54 CST 2021
Name: Lorna Evans
Report Type: Error Report
Opt Subject: Arabic U+089D..U+089F, U+08D0..U+08D2 have wrong property

Note: This has now been corrected in the Alpha data file.

These characters have "ON" in UnicodeData:

089D;ARABIC SUPERSCRIPT ALEF MOKHASSAS;Mn;230;ON;;;;;N;;;;;
089E;ARABIC DOUBLED MADDA;Mn;230;ON;;;;;N;;;;;
089F;ARABIC HALF MADDA OVER MADDA;Mn;230;ON;;;;;N;;;;;

and

08D0;ARABIC SUKUN BELOW;Mn;220;ON;;;;;N;;;;;
08D1;ARABIC LARGE CIRCLE BELOW;Mn;220;ON;;;;;N;;;;;
08D2;ARABIC LARGE ROUND DOT INSIDE CIRCLE BELOW;Mn;220;ON;;;;;N;;;;;

They should be "NSM".

See Unicode proposal: https://www.unicode.org/L2/L2019/19306-quranic-additions.pdf 

Date/Time: Mon Mar 1 16:47:35 CST 2021
Name: Erik Carvalhal Miller
Report Type: Public Review Issue
Opt Subject: PRI #428: Comment for U+02B9

The first comment for U+02B9 MODIFIER LETTER PRIME in block Spacing Modifier
Letters (unchanged in the 14.0 alpha) says, “primary stress, emphasis”; I
recommend either removing the word “primary” or else inserting the phrase
“secondary stress”, to better reflect the broad, varied use of the character
in marking stress, as the current wording is misleadingly specific.

Background & reference:  U+02B9ʼs use for primary stress in some
dictionaries is undisputed, but L2/20-286 shows excerpts from historical and
contemporary dictionaries in which phonetic spellings employ U+02B9 for
secondary stress as well.  (As reported in L2/21-016 §I.3o, the UTC rejected
L2/20-286ʼs proposal to separately encode a prime‐symbol variant that
represents primary stress in those excerpts, but the rejection does not
impinge on the secondary‐stress use in evidence.)

Date/Time: Tue Mar 2 13:52:43 CST 2021
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #428: decompostion of 107A9

Note: This has now been corrected in the Alpha data file.

107A9   MODIFIER LETTER SMALL R WITH FISHHOOK
               # <super> 207E

Decomposition of 107A9 must read <super> 027E (instead of <super> 207E).

Date/Time: Sun Mar 7 11:09:38 CST 2021
Name: Marc Lodewijck
Report Type: Public Review Issue
Opt Subject: PRI #428: Headers for 116B9 and for 11740 sqq.

Note: This feedback has been taken into account in updates for the NamesList.txt file.

1. As for 1183B DOGRA ABBREVIATION SIGN, the header above 116B9 TAKRI ABBREVIATION 
SIGN should be "Punctuation" and not "Sign".

2. Header for 11740..11746 (in the Ahom block) could be "Additional consonants" 
rather than "Additional consonants for Tai Ahom".

Date/Time: Mon Mar 15 15:25:18 CDT 2021
Name: jennifer daniel
Report Type: Public Review Issue
Opt Subject: Changing the names of two emoji alpha candidates

After getting feedback that was somehow missed last October the ESC recommends 
we change the names of two emoji alpha candidates:

Current Names

1FAC3    MAN WITH SWOLLEN BELLY

1FAC4    PERSON WITH SWOLLEN BELLY

Recommendation, Modified

1FAC3    PREGNANT MAN

1FAC4    PREGNANT PERSON


Rationale in the link, below. Given that we somehow missed this feedback we didn't 
want to wait until the next UTC meeting to make this recommendation. 

https://www.unicode.org/L2/L2021/21055-esc-response-fdbk.pdf 

Additional background info:
https://www.unicode.org/L2/L2021/21056-esc-gender.pdf 

Date/Time: Fri Mar 19 19:46:34 CDT 2021
Name: David Corbett
Report Type: Error Report
Opt Subject: Inconsistent identifer types for Komi letters

The obsolete Komi letters U+052A..U+052D have Identifier_Type=Obsolete but
the other obsolete Komi letters U+0500..U+050F have
Identifier_Type=Recommended.

Date/Time: Sun Mar 21 12:17:47 CDT 2021
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #428: Prepended_Concatenation_Mark

Note: This has now been added to the Alpha version of PropList.txt.

U+0890 ARABIC POUND MARK ABOVE and U+0891 ARABIC PIASTRE MARK ABOVE should
have Prepended_Concatenation_Mark=True.

Date/Time: Wed Mar 24 16:19:46 CDT 2021
Name: Lorna Evans
Report Type: Error Report
Opt Subject: U+08C8 ArabicShaping name

While I did laugh at this name in ArabicShaping, I think we could 
come up with a better name:

08C8; KEHEH WITH DOOHICKEY ABOVE; D; GAF

It seems that the Arabic Shaping name was never discussed as far as 
I can tell from script-adhoc notes, nor from UTC minutes.

L2/19-077 originally requested the character to be ARABIC LETTER 
KEHEH WITH HAMZA ABOVE which indicates to me there is some association 
with a hamza.

This was later changed to ARABIC LETTER GRAF in L2/19-252

I would suggest something like this:

08C8; KEHEH WITH EXTENDED HAMZA ABOVE; D; GAF

Lorna

Date/Time: Wed Mar 31 15:54:10 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Final round of revision to the codechart anottations, but the second half correspond to the pictograms

The first half corresponds to annotations that I missed the first two
rounds, but the second half corresponds to the pictograms.

Arabic:
  06C5 ARABIC LETTER KIRGHIZ OE: On the second bullet note,instead of reading 
  "a barred form also occurs", it would be better if it read "a glyph variant 
  replaces the looped tail with a horizontal bar through the tail"

Arabic Extended-B: 
  088E ARABIC VERTICAL TAIL: The header above this character should read 
  "Abbreviation mark" instead of "Abbreviation letter" A better phrasing of 
  the bullet note below would be "mark used to indicate abbreviations in moveable 
  type texts from Iran" followed by another note saying: "considered a letter; 
  only attested in final form"

Glagolitic:
  2C2F GLAGOILITIC LETTER CAUDATE CHRIVI: The bullet note cites the characters 
  it can combine with, but the glyphs with the dotted circle are missing. 
  Furthermore, informative aliases should be added "= cherv, chrivi with tail"

Arabic Presentation Forms-A:
  FDCF ARABIC LIGATURE SALAAMUHU ALAYNAA: Another bullet note could be added 
  stating "used in Christian texts"

Kana Extended-B:
  The initial note states that the system in question is "obsolete", which 
  seems to imply that it was replaced by another system, and it also states that 
  it was used in Taiwan; which is true, but it was also used in a nearby region 
  of mainland China.

Ethiopic Supplement:

  Given the new information of the legacy Gurage orthography the header
  above 1380 that reads "Syllables for Sebatbeit" should read "Legacy
  syllables for Gurage orthographies" Followed by a note under this header
  saying "These characters were originally encoded to represent the
  Sebatbeit language, but their use extended beyond that language to an
  entire linguistic region called 'Gurage'; therefore the term 'Sebatbeit'
  inserted in the character names, should not be interpreted as exclusionary
  to other languages, but a mere historical artifact. The orthography for
  the Gurage languages has been updated to use new syllables and these are
  encoded in the 'Ethiopic Extended-B' block." It's unclear if the header
  above 2DC0 (in the Ethiopic Extended block) should also be modified
  accordingly, but the block descriptions in the Spec, should be updated
  accordingly.

Transport and Map Symbols:
  1F6DE WHEEL: The informative alias "= tire" could be added
  1F6DF LIFE BUOY: The informative alias "= life saver" could be added

Geometric Shapes Extended:
  1F7F0 BOLD EQUALS SIGN: The addition of this symbol in this block (as opposed 
	to Symbols and Pictographs Extended-A) is dubious.

Symbols and Pictographs Extended-A:
  1FA74 THONG SANDAL: These informative aliases "= flip flop, chancla" could be added
  1FA78 DROP OF BLOOD: Mutual cross references to "1F4A7 💧 droplet" and "1F322 🌢 black droplet" could be added
  1FA79 ADHESIVE BANDAGE: The informative alias "= band aid" could be added.
  1FA85 PINATA: A bullet note could be added stating "the name is usually spelled 
	with an 'Ñ'(PIÑATA) but Unicode names can only contain ASCII characters"
  1FAAA IDENTIFICATION CARD: There should be an informative alias stating "= ID", 
	as well as a bullet note stating "can be used to represent a driver's license or any other form of photo id"
  1FAAB LOW BATTERY: There should be a mutual cross reference to "1F50B 🔋 battery"
  1FAAC HAMSA: A bullet note could be added stating "can either point up or down".
  1FAE6 BITTING LIP: A mutual cross reference to "1F5E2 🗢 lips" could be added
  1FAF6 HEART HANDS: There is no need for the rays emanating from the "heart"; leaving 
	them may imply that their inclusion is mandatory, so I recommend removing them 
	from the representative glyph. I would also like to ask, whether or not this 
	character can support different skin tones for each hand, in the future; 
	similar to the HANDSHAKE.

Date/Time: Thu Apr 1 19:17:17 CDT 2021
Name: Eduardo Marín Silva
Report Type: Other Question, Problem, or Feedback
Opt Subject: Request to correct errata in my own piece of feedback of the Unicode 14.0 alpha

My last piece of feedback was accidentally called "Final round of revision to the 
codechart anottations, but the second half correspond to the pictograms" with the 
second half added by mistake, so it should instead read "Final round of revision 
to the codechart annotations" with the corrected spelling of 'annotations'
If it's possible, I also noticed that my piece of feedback for the ARABIC VERTICAL 
TAIL reads "considered a letter; only attested in final form", when it should read 
"considered a letter, not a presentation form, but only attested in final form"
Any other mistakes in my pieces of feedback are minor and so do not need correction.

Date/Time: Sat Apr 3 11:31:51 CDT 2021
Name: Ivan Panchenko
Report Type: Error Report
Opt Subject: Error in Egyptian Hieroglyphs file


The Egyptian Hieroglyphs file (U13000.pdf) contains the misspelling
“Invertabrata”. The correct spelling (which was also used by Gardiner) is
“Invertebrata”.

Date/Time: Sat Apr 10 11:49:32 CDT 2021
Name: r00ster
Report Type: Error Report
Opt Subject: Chinese numerals are not classified as numerals

Hello Unicode,

I noticed that you classify Chinese numerals as Lo (other letters) which
does not seem very correct to me because I believe Chinese numerals should
be classified as numerals and not as other letters.

If I go to articles listed on the right of
https://en.wikipedia.org/wiki/Numeral_system and try out a few characters
listed on these articles, they mostly work (except for some rather outdated
scripts such as Tangut numerals) and they are detected by Unicode as
numeric, but for Chinese numerals, this is not the case. None of the
numerals are detected as numeric. Especially for such a widely spoken
language I would expect Unicode to correctly classify the numerals of that
language. It is true that in Chinese there is an overwhelmingly large amount
of (single) numeral characters, but I believe it is possible to maybe just
classify at least the very basic 零/〇、一、二、三、四、五、六、七、八、九 (0-9) as numerals,
and leave all other numerals beyond that classified as other letters.

Is it possible for you to reclassify them as numerals in a future version?
See also: https://github.com/rust-lang/rust/issues/84056. Classifying
Chinese numerals as numerals will of course mean support for other East
Asian languages too, such as Japanese and Hokkien.

Thank you in advance.

Date/Time: Sat Apr 10 18:49:00 CDT 2021
Name: Mikoto Ohtsuki
Report Type: Public Review Issue
Opt Subject: 1B11F-1B122 in Unicode 14.0 Alpha (PRI #428: Unicode 14.0 Alpha Review)

If kana letters proposed at 1B11F-1B122 became candidate for Unicode 14
based on L2/19-381, rationale seem insufficient.  AFAIK, they are assumed to
be just inventions primarily to fill up empty cells in syllabary chart
called gojuonzu (50 sound chart).  Usually they appear in some gojuonzu
compiled in around late 19th century-early 20th century and lack examples in
text actually used to spell words in accordance with proposed
characteristics.

Existing of YI syllable separate from I syllable, and of WU syllable
separate from U syllable has not been attested in history of Japanese
phonology or orthography.  Therefore it is not possible to happen that
native Japanese words such as いもうと, まうす, ようべ in page 6 of L2/19-381, and
やいば, ついたち, ちひさい in page 10 were written using kana intended for WU or YI
syllable.  Note that standard う (U) was used in corresponding hiragana forms
of them in page 6 instead of kana intended for WU.  Chart contradicts
itself.

Pages 2 and 7 show 衣 (U+8863) as Kanji Derivation for 1B12D, now shifted to
1B121, KATAKANA LETTER ARCHAIC YE.  However 衣 is origin of 1B000 KATAKANA
LETTER ARCHAIC E.  It is inconsistent evidently.  Rather than thinking this
kana was derivation from single kanji, thinking it was compound form of イ
(I) and エ (E) would be more appropriate as mentioned in footnote.  It would
be KATAKANA LETTER LIGATURE IE.

It is strongly suspected that referenced books were written without
scholarly knowledge.  Including them with current characteristics in Unicode
14 is questionable.  I'd like UTC to consider two matters.

First, please postpone inclusion of them to Unicode Standard till their
characteristics are confirmed by expert input or examples actually in use
with proposed characteristics are provided.  If such input is unavailable,
please consider another way like encoding them as itaigana (kana variant)
for standard I and U kana letters.

Second, please reconsider their names.  Using same ARCHAIC prefix to both
kana dating from Heian era (8th-12th century) and kana invented by `there
should be to fill up gojuonzu` attempt in early modern period gives odd
feeling.  Please don't call latter kana ARCHAIC.

Date/Time: Sun Apr 11 02:28:55 CDT 2021
Name: Patrik Sjöwall
Report Type: Public Review Issue
Opt Subject: Unicode 14.0 Alpha review


I found a few issues with some characters for Unicode 14.0 that seem to have
gone unnoticed:

0874 ARABIC LETTER ALEF WITH ATTACHED KASRA
0875 ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA
0879 ARABIC LETTER ALEF WITH ATTACHED ROUNDDOT BELOW
087C ARABIC LETTER ALEF WITH RIGHT MIDDLE STROKE AND DOT ABOVE
087D ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA AND DOT ABOVE
0880 ARABIC LETTER ALEF WITH ATTACHED BOTTOM RIGHT KASRA AND LEFT RING

These letters reqiure more shaping information. It is not clear how the
attached fatha or dot will behave in an obligatory LAM-ALEF ligature.


088E ARABIC VERTICAL TAIL

This character is missing in ArabicShaping-14.0.0.txt, but it always joins
with the preceding letter. It should be included in that file, either as
Right_Joining or be given a new joining type (since it does not change its
shape, only causes the character to its right to join), and with either a
joining group of its own or No_Joining_Group.


08FB ARABIC DOUBLE RIGHT ARROWHEAD ABOVE
08FC ARABIC DOUBLE RIGHT ARROWHEAD ABOVE WITH DOT

The comment "also used in Quranic text in African and otherorthographies to
represent dammatan" should come after 08FB, not 08FC. The "right arrowhead"
is an angular-shaped damma, and the "dammatan" is a double damma (not a
double damma with dot).


A7C0 LATIN CAPITAL LETTER OLD POLISH O
A7C1 LATIN SMALL LETTER OLD POLISH O

This letter should be named "O ROGATE", the name "commonly used among
specialists" according to the proposal. Then a comment below could say "used
for nasal vowel in Old Polish". The current name sounds like this was a
letter used instead of "O" in Old Polish, which is not the case.


A7D3 LATIN SMALL LETTER DOUBLE THORN
A7D5 LATIN SMALL LETTER DOUBLE WYNN

These two small letters are added to the standard without matching capitals.
That is incosistent with how other comparable letters are encoded. Letters
used in a casing orthography are almost always encoded as casing pairs, even
if they do not appear in the beginning of a word and the capital leter thus
only appears in ALL-CAPS TEXT. As far as I know at least the following
capitals were encoded without being needed outside all-caps:

    0184 LATIN CAPITAL LETTER TONE SIX
    01A6 LATIN LETTER YR
    01A7 LATIN CAPITAL LETTER TONE TWO
    01BC LATIN CAPITAL LETTER TONE FIVE
    0220 LATIN CAPITAL LETTER N WITH LONG RIGHT LEG
    037F GREEK CAPITAL LETTER YOT
    042A CYRILLIC CAPITAL LETTER HARD SIGN
    042C CYRILLIC CAPITAL LETTER SOFT SIGN
    1E9E LATIN CAPITAL LETTER SHARP S
    2C1F GLAGOLITIC CAPITAL LETTER YERU
    2C20 GLAGOLITIC CAPITAL LETTER YERI

It is possible that one or two have been used word-initially in languages
that were not supported when they were added. On the other hand, it is also
quite likely that there are more encoded capitals that never occur in the
beginning of a word.

Apart from that (and issues already addressed by others) everything looks
fine so far.

Best regards!
/Patrik Sjöwall

Date/Time: Sun Apr 11 05:17:33 CDT 2021
Name: Wang Yifan
Report Type: Public Review Issue
Opt Subject: PRI #428: comments on U+1F7F0 and U+1F979


On U+1F7F0:
Might be good to have a cross-reference to U+3013 GETA MARK 
for pure graphic resemblance, and vice versa.

On U+1F9F9:
The current glyph of FACE HOLDING BACK TEARS does not sufficiently 
distinguish it from U+1F9FA FACE WITH PLEADING EYES. A quick 
suggestion that I think effective is to paint tears white 
(non-hatched) and use a dumbbell-shaped mouth.

In the light of the original proposal, this character is 
intended to include the Samsung emoji depicted in the 
page 1 of this document.
http://www.unicode.org/L2/L2020/20064-face-holding-back-tears.pdf

Here, the dumbbell-shaped mouth is a key feature characterizes the emoticon
being a stylized depiction of the lip-biting expression in the East Asian
graphical convention. It is different from both upward (pouting) and
downward (neutral-smiling) curled mouth. This type of expression is also
seen in most of the actual examples cited in the page 5 of the proposal,
thus should not be left out.

Meanwhile, there is U+1F9FA that usually implemented with similarly watery
eyes. (See https://emojipedia.org/pleading-face/)

Even though not reflected in the current code chart, such designs should be
interpreted as the inherent semantics in the original proposal (as FACE WITH
GLISTENING EYES;
https://www.unicode.org/L2/L2017/17244r-emoji-faces-v11.pdf) instead of mere
vendors' discretion, and should be respected as such.

The alpha glyph of U+1F9F9 has a rather intricate design of eyes that makes
it hard to tell tears apart from eyeballs in black-and-white printing. The
tears should be graphically more distinctively separated from its background
in order to avoid misinterpretation that it has exactly same kind of eyes
the existing glyphs of U+1F9FA have. (Optimally, U+1F9FA should be also
updated to have more upward-looking eyes and downward-sloping eyebrows in
the code chart.)

Last year, U+1F9FA was "the third most used emoji on Twitter" according to
Emojipedia, and awarded "Neologism of the Year 2020" in Japan. Special care
should be taken to avoid possible confusion by existing users.

https://blog.emojipedia.org/a-new-king-pleading-face/
https://ja.wikipedia.org/wiki/%E3%81%B4%E3%81%88%E3%82%93

Date/Time: Mon Apr 12 16:47:19 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Combining Diacritical Marks Extended

Move COMBINING DOUBLE PLUS SIGN ABOVE and COMBINING DOUBLE 
PLUS SIGN BELOW to immediately after COMBINING PLUS SIGN ABOVE.

Date/Time: Mon Apr 12 17:58:47 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Encode COMBINING OVERCURL at 1ACF


COMBINING OVERCURL was first proposed on 2017-09-27 in L2/17-342 (N4902). A
revised proposal was published on 2017-10-17 in L2/17-358 (N4907). An
attempt was made to ballot not a single combining character but as a number
of atomic letters. The argument for atomic characters was basically "It
might be hard to implement in Noto fonts", which I consider to be pretty
ridiculous. Irish ballot comments on 2018-12-20 refuted this, saying that
the OVERCURL is definitely an abbreviation mark, not a basic orthographic
letter, so it would be both inappropriate and impractical to have to work
with atomic characters when all of the other related marks used in medieval
palaeography (COMBINING OVERLINE, COMBINING INVERTED BREVE, COMBINING
FERMATA) are treated as the diacritical marks they are. The atomic
characters were taken off the ballot evidently because US ballot comments
now said that the COMBINING OVERCURL was nothing more than a glyph variant
of COMBINING INVERTED BREVE. No justification for this assertion was made.
The COMBINING OVERCURL was balloted again at 1DFA. Irish ballot comments on
2019-05-06 reaffirmed Ireland's support for the combining character, but
again it was taken off the ballot. In that document, Irish ballot comments
contained a draft UTN describing the rules for drawing glyphs with a
combining overcurl. (The basic rule is "The OVERCURL simply has to attach at
a convenient point, and swing over towards the left." This is not something
that any competent font designer would fail at doing.

The proposal documents clearly described the use of these kinds of marks.
COMBINING OVERLINE and COMBINING INVERTED BREVE are typically used to
indicate an -m or an -n following the previous letter. COMBINING OVERCURL is
used to indicate an -m or an -n, but often it is meaningless, and in Middle
Scots when over an s it means "shilling" (L2/20-267 (N5144)); in Middle
English when following an r it may mean -e or it may mean nothing. The
COMBINING INVERTED BREVE does not have this polyvalence. 

COMBINING OVERCURL is not a glyph variant of either COMBINING OVERLINE or
COMBINING INVERTED BREVE. The UCS contains in Latin Extended-B twelve
characters used in South Slavic poetics (Ȃ ȃ Ȇ ȇ Ȋ ȋ Ȏ ȏ Ȓ ȓ Ȗ ȗ) and
certainly no one would ever consider an overcurl glyph-variant on these
letters to be acceptable. The suggestion that the US NB has made, that it
must be proved that COMBINING OVERCURL isn't a glyph variant of COMBINING
INVERTED BREVE is based on nothing but a casual assumption. But the proposal
documents show that the OVERCURL can mean -m, -n, -e, Ø or be a complete
abbreviation (as in shilling), and the INVERTED BREVE can only mean (and
always does mean in medieval British palaeography) -m and -n, or a tone
contour in South Slavic.

I prepared but have not yet published a document showing how the existing
COMBINING ZIGZAG above is a free-floating diacritical mark in continental
Europe, but in Britain grows a tail and attaches to the base letter. We do
not need a "combining attaching zigzag" and with regard to the overcurl I
showed that if we simply take a half-arc and rotate it 45 degrees over a
dotted circle, and if a font were to implement that without fusing the
OVERCURL to the base letter, it would remain legible, and indeed in my own
work my monowidth font does not have fused forms while my publication fonts
do. (The OVERCURL is still much bigger than the INVERTED BREVE.) The Script
Ad Hoc has seen this draft and accepts that the fusion aspect of rendering
is not a real problem.

In 2020 Volume I of Corpus Textuum Cornicorum "The Charter Fragment and
Pascon agan Arluth" was published, making use of the COMBINING OVERCURL in
both transcriptions and as a combining character in descriptions of the
abbreviations used. The datafiles cannot be published because they contain
one Private-Use character, and that defeats the purpose of plain-text
encoding. The suggestion that OVERCURL is a stylistic glyph variant of
INVERTED BREVE means that the data regarding which BREVES are to duse and
which are not would be left to a higher level protocol, which is
inappropriate because of the semantics (again m/n on the one hand and
m/n/e/Ø/etc on the other).

To summarize:
1) INVERTED BREVE and OVERCURL do not have the same semantics.

2) INVERTED BREVE and OVERCURL do not have the same shapes.

3) A draft UTN outlining the rendering issues exists. It gives clear and
simple advice to any typographer.

4) Unfused forms using a large tilted OVERCURL which happens not to fuse are
legible and preserve in plain text the distinction required. This is
analogous to the Continental and British variation of the COMBINING ZIGZAG.

5) Palaeographic readings of the two earliest MIddle Cornish have been
published and more texts are being prepared which will also distinguish
OVERCURL and INVERTED BREVE.

6) Palaeographic readings of the New Testament in Middle Scots are being
prepared and this text too distinguishes OVERCURL and INVERTED BREVE.

7) The OVERCURL form is not an acceptable glyph variant for South Slavic
poetics, and cannot be applied to 0203, 0204, 0206, 0207, 020A, 020B, 020E,
020F, 0212, 0213, 0216, or 0217. In Middle Scots, however, the COMBINING
OVERCURL occurs over vowels even word-internally, as in the word ȋto 'into'
(I had to use the inverted breve because the overcurl isn't encoded) and it
is definitely NOT an unattached breve.

Please encode COMBINING OVERCURL at 1ACF. Further delay is of no benefit to anybody.

Date/Time: Mon Apr 12 18:08:02 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Currency Symbols

Like the EURO SIGN and other characters, the SOM SIGN U+20C0 should be shown
in a Times-like font. 

Date/Time: Mon Apr 12 18:09:37 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Supplemental Punctuation

The barred square brackets from 2E56..2E58 should be drawn on the same basis
as other square brackets in the code charts. 

Date/Time: Mon Apr 12 18:12:06 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Glagolitic

The glyphs fr the two new characters must be improved. 

Date/Time: Mon Apr 12 18:21:57 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Supplemental Symbols and Pictographs

Something is wrong with the glyphs for 1F979 and 1F97A. The face shown at
1F979 looks just like the glyph for 1F97A in the macOS and iOS Apple Color
Emoji UI font.

Thanks for keeping my TROLL glyph. 

Date/Time: Mon Apr 12 18:32:17 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Symbols and Pictographs Extended-A

We needed a MIRROR BALL? Saints preserve us.
 
Nests both with and without eggs. I'm sure I need a STEGOSAURUS and a
TRICERATOPS much more.

The glyph for 1FAE1 is pretty illegible. The glyph for 1FAE2 must have a
direction error.

I can't imagine what a DOTTED LINE FACE is intended to represent.
Invisibility? It is a TERRIBLE name.

I suppose the new Hand symbols are welcome but I think we still have the
problem of the thumbs-up and thumbs-down emojis not being what the viewer
would actually see if he were looking at his own hand. Try it.

It is still an appallingly US-centric oversight that 1F594 🖔 REVERSE VICTORY
HAND has not been emojified. This is used everywhere in Britain and Ireland.
It is a weaker form of 🖕 REVERSED HAND WITH MIDDLE FINGER EXTENDED. This has
been mentioned (and ignored) before. But now we get a MIRROR BALL and two
kinds of nest.

Date/Time: Wed Apr 14 12:31:43 CDT 2021
Name: Eduardo Marín Silva
Report Type: Public Review Issue
Opt Subject: Alpha Review: CJK Unified Ideograph Extension B

In this proposal (https://www.unicode.org/L2/L2018/18063-remove-ucs2003-ext-b.pdf)
the removal of the UCS2003 glyphs from the codechart was proposed (this proposal was
accepted by the IRG). However the current version 14 alpha charts still
maintains them.

Removal of the glyphs would allow to fit four columns of width 2 rather per
page, than the current 3 columns that are 3 wide. This in turn would
substantially reduce the number of pages of the codechart, reducing the
memory strain caused by trying to consult the charts.

Remaking the codechart is far from a trivial task, however I mention this to
get some sort of update on the issue (given the time since it has been
introduced).

Date/Time: Wed Apr 14 17:11:14 CDT 2021
Name: Paul Masson
Report Type: Error Report
Opt Subject: kPhonetic for U+52E4

This character appears in group 574 on p.85 of Casey. The field is missing 
in the database and needs to be added.

Date/Time: Fri Apr 16 09:40:46 CDT 2021
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: PRI #428: Soft_Dotted property of U+1DF1A

Proposed character U+1DF1A LATIN SMALL LETTER I WITH STROKE AND RETROFLEX HOOK 
should have the Soft_Dotted property like other variants of the letter i.

Date/Time: Fri Apr 16 17:24:19 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Latin Extended-D

Please fill the empty spaces at A7D2 and A7D4 with the characters those
spaces have been left for, LATIN CAPITAL LETTER DOUBLE THORN and LATIN
CAPITAL LETTER DOUBLE WYNN respectively. These characters come from the
Ormulum, an important and very long Middle English text for which the
author, Orm, devised an orthography which marked short vowel length
regularly by doubling letters after the vowel (as in "menn" 'men' and "wiþþ"
'with'). Orm's orthography also marked this by superscripting a letter (as
in "menᷠ" 'men') but where a short vowel preceded -þ or -w (-ƿ), the bowl of
the thorn and the wynn were doubled. Orm knew very well what a capital
letter was and he was scrupulous in using them. The addition of TIRONIAN
SIGN CAPITAL ET to the UCS was in part based on the evidence from Orm's
text. Double Thorn and Double Wynn would not begin a word or sentence
because the orthography uses the double characters after vowels, but if Orm
(or a modern editor, like me, who am preparing a palaeographic reading of
The Ormulum) wanted to write a word in ALL CAPS or in ꜱᴍᴀʟʟ ᴄᴀᴘɪᴛᴀʟꜱ, he
(and I) would certainly know to do so. This argument has been put forward
many times for letters used in natural orthographies (indeed they were put
forward for other characters in Latin Extended-D. The UTC has not explained
to me why they have left the blanks. If they are waiting to find out if Orm
ever wrote the word WIÞÞ 'WITH' in all caps, well, I do not have the answer,
because the text is 30,000 lines long. But Orm is dead, and Orm is not
trying to use the Unicode Standard. I and other scholars who work with the
Ormulum have a reasonable expectation that its characters should behave as
normal. An editor who wishes to write a vocabulary and use ALL CAPS in the
headwords should be able to do so. An editor who submits an article title
with "wiþþ" in it (with a double thorn) to a journal that puts the article
titles in small caps in the header will expect normal casing behaviour.
Casing behaviour is a natural function of the Latin script.

Please fill the empty spaces at A7D2 and A7D4 with the characters those
spaces have been left for, LATIN CAPITAL LETTER DOUBLE THORN and LATIN
CAPITAL LETTER DOUBLE WYNN respectively. 

Date/Time: Fri Apr 16 17:37:02 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Symbols and Pictographs Extended-A

1FAF1 RIGHTWARD BACKHAND and 1FAF2 LEFTWARD HAND are misnamed. "Backhand"
refers to a kind of tennis swing; it does not refer to the back of a hand.
Handedness is something the UCS should have dealt with long ago.
RIGHT-POINTING BACK OF HAND is what the first one is, and LEFT-POINTING
FRONT OF HAND is what the other one is. All of the existing hands should be
looked at with regard to this. 

Note that the THUMBS UP and THUMBS DOWN hands are not completely encoded.
Users should be able to select whether they wish to show hands with thumbs
up or down based on how the would see it if they were holding their hands
out in front of them. When I look at my right hand thumbs up I see the palm.
When I look at ny right hand thumbs down, I see the back.

This is Alpha, so if there is a wish to make some of these hands make sense,
now is the time to complete the set logically. I would help between now and
beta if asked.

Date/Time: Fri Apr 16 18:20:45 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Latin Extended-D

Patrik Sjöwall has suggested that OLD POLISH O should be named O ROGATE. I
do not find the word ROGATE /roʊɡeɪt/ in the Oxford English Dictionary. I am
sorry to disagree with him, but in the absence of knowing what "rogate"
means I can't recommend this, and its non-appearance in the OED—well, it
means that even I don't know how to find out what a "rogate O" is. "Rogare"
means 'to ask' in Latin. OLD POLISH O means it is a kind of O found in Old
Polish, not that it is /o/ in Old Polish. 

Date/Time: Fri Apr 16 18:27:01 CDT 2021
Name: Michael Everson
Report Type: Public Review Issue
Opt Subject: Symbols and Pictographs Extended-A


1FAF3 should be PALM FACING UPWARDS
1FAF4 should be PALM FACING DOWNWARDS
1FAF5 should be UNCLE SAM HAND (well, okay) or HAND WITH INDEX FINGER POINTING FORWARD

POINTING AT THE VIEWER should at least be POINTING TOWARDS VIEWER, if the viewer has to be taken into account.