Public Review Issues

Accumulated Feedback on PRI #540

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

The links below go to locations in this document for feedback.

Feedback routed to CJK & Unihan Working Group for evaluation [CJK]
Feedback routed to Script Encoding Working Group for evaluation [SEW]
Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]
Feedback routed to Emoji Standard & Research Working Group for evaluation [ESR]
Feedback routed to Editorial Working Group for evaluation [EDC]
Feedback routed to Charts Working Group for evaluation [CHARTS]
Other Reports

Feedback routed to CJK & Unihan Working Group for evaluation [CJK]

(None at this time.)

Feedback routed to Script Encoding Working Group for evaluation [SEW]

Date/Time: Fri Jan 30 14:22:06 PT 2026
ReportID: ID20260130142206
Name: Charlotte Buff
Report Type: PRI Feedback
Opt Subject: Shaaldaa character names are inconsistent

As per consensus 186-C20, the code points for the Shaaldaa script have been provisionally assigned using the character names in section V of 
L2/26-040R. However, the document in question also includes a second list of character names in section IV that differs in a number of ways:

U+1C800..U+1C817 and U+1CAA0..U+1CAB7 include the word “SYLLABLE” in section IV but not in section V
U+1C80C: “SECONDARY VOWEL BASE” (section IV) vs. “SECONDARY BASE” (section V)
use of “GEMINATE” (section IV) vs. “GEMINATED” (section V)
The section V names contain some errors:

U+1C804 includes the word “VOWEL” but none of the other vowels do
U+1CAB2..U+1CAB5 are duplicates of U+1CAAD..U+1CAB0; the latter should use single vowels instead of double vowels
The section IV names appear to be more consistent with existing naming practices on the whole, but they also contain some errors:

U+1CA0D should be syllable KKHEE instead of KKXEE
U+1CB20 uses “SMR” instead of “DIGIT”

Date/Time: Thu Feb 21 10:00:21 PT 2026
ReportID: ID20260221100021
Name: Anne Xuan
Report Type: PRI Feedback
Opt Subject: Request UTC consider that Cirth and Tengwar on the Unicode’s Roadmaps

As UTC rejected Klingon Script encoding proposal, but another two scripts are used in Conlang, or Constructed Language, are still appeared on 
the Unicode's Roadmaps. These two scripts are not communicated by somebody in the real life. I request that UTC consider if remove Cirth and 
Tengwar on the Unicode's Roadmaps.

Also, in document L2/20-169 and L2/21-174, UTC said they could reconsider the action for Rejected Klingon Script, if kind of these scripts were 
accepted by UTC, I think re-add it into Roadmaps is a better choice in the future.

Reference
1. WG 2 N 1641, Proposal to encode Tengwar in Plane 1 of ISO/IEC 10646-2
2. WG 2 N 1642, Proposal to encode Cirth in Plane 1 of ISO/IEC 10646-2
3. L2/20-169, Recommendations to UTC #164 July 2020 on Script Proposals
4. L2/21-174, Recommendations to UTC #169 October 2021 on Script Proposals
5. Unicode Roadmaps for SMP www.unicode.org/roadmaps/smp

Date/Time: Tue Mar 10 01:51:04 PT 2026
ReportID: ID20260310015104
Name: ayaan
Report Type: Report Error in Publication/Data
Opt Subject: Incorrect glyph of U+06C4 in unicode charts

Unicode charts display incorrect glyph of Unicode Character “ۄ” (U+06C4), which is used for Kashmiri Language. 

The correct glyph should have the 'small circle' as the last part of the stroke. It should not extend further beyond that point. 
But the current glyph incorrectly extends the basal part after the 'small circle', instead of terminating.

Here are some screenshots of correct version from Kashmiri Books. 



Source- Kashir Dictionary (https://archive.org/details/dli.ernet.241982/page/n51/mode/2up)



Source: JKBOSE Class 6 textbook (https://jkbose.jk.gov.in/PageDoc/Kashmiri%20class%20VI%20-2024.pdf)

Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]

Date/Time: Sun Jan 08 23:01:12 PT 2026
ReportID: ID20260108230112
Name: Mikhail Merkuryev
Report Type: Report Error in Publication/Data
Opt Subject: Again breaking by hyphen


I’m pleased that you’ve taken my issue to discussion. I suggest writing these things to TR14 section 5.3 Use of Hyphen. Brush up as you wish.

Unless you’ve done morphological analysis, we strongly discourage you from:

Breaking out one character: 7- / bit, да- / с (Russian: yes milord)
And discourage you from:

Breaking out two characters: кто- / то (Russian: someone)
Breaking out short numbers: 128- / bit
No change in formal algorithms.

Date/Time: Thu Jan 12 16:07:38 PT 2026
ReportID: ID20260112160738
Name: Meghan Denny
Report Type: Report Error in Publication/Data
Opt Subject: typo in idna/Idna2008.txt comment


https://www.unicode.org/Public/17.0.0/idna/Idna2008.txt contains the following comment:


# Field 1: IDNA2008_Category, consisting of one of these values
#            "PVALID"     - Protocol valid (generally Letters, Digits and Hyphen)
#            "CONTEXTJ"   - Join control
#            "CONTEXT0"   - Other code points requiring context
#            "DISALLOWED" - The code point is not allowed in IDNA2008
#            "UNASSIGNED" - The code point is not assigned in this version
"CONTEXT0" should be "CONTEXTO" in the next release as that would reflect the data accurately.

Date/Time: Thu Feb 5 15:56:17 PT 2026
ReportID: ID20260205155617
Name: Sergiusz Wolicki
Report Type: Report Error in Publication/Data
Opt Subject: No HH in field 1 description in LineBreak.txt


https://www.unicode.org/Public/UCD/latest/ucd/LineBreak.txt:

# Field 1: Line_Break property, consisting of one of the following values:
#   Non-tailorable:
#         "BK", "CM", "CR", "GL", "LF", "NL", "SP", "WJ", "ZW", "ZWJ"
#   Tailorable:
#         "AI", "AK", "AL", "AP", "AS", "B2", "BA", "BB", "CB", "CJ",
#         "CL", "CP", "EB", "EM", "EX", "H2", "H3", "HL", "HY", "ID",
#         "IN", "IS", "JL", "JT", "JV", "NS", "NU", "OP", "PO", "PR",
#         "QU", "RI", "SA", "SG", "SY", "VF", "VI", "XX"

The new HH property values is missing from the list.

Date/Time: Tue Feb 10 05:15:43 PT 2026
ReportID: ID20260210051543
Name: Ismael RH
Report Type: Report Error in Publication/Data
Opt Subject: Collation of "barred closed omega"

Dear staff,

The character "closed omega" is collated as a variant of "o" in IPA Extensions (lowercase) as well as in Latin Extended-F (modifier lowercase). 
Additional EPA variants (closed omega with long stem; turned closed omega) are also collated as such in the provisional order for EPA letters in 
Latin Extended-G.

In light of this, I would like to request UTC to place "barred closed omega" after "barred eng" so that it is likewise treated as a variant of 
"o" rather than of "w", in consistency with the rest of encoded (or futurely encoded) barred omegas.

Yours truly,
Ismael

Date/Time: Mon Feb 9 06:36:22 PT 2026
ReportID: ID20260209063622
Name: Mikhail Merkuryev
Report Type: Report Error in Publication/Data
Opt Subject: Proto-cuneiform: suspect wrong data

Chars 12550…125A7 are Xsux (cuneiform)

12A58…1264B are Pcun (proto-cuneiform)

1264C…12686 are Xsux again?

Are you sure what you are doing? Shouldn’t they be all Pcun?

Date/Time: Sun Feb 22 08:48:37 PT 2026
ReportID: ID20260222084837
Name: Paul Wood FRSA
Report Type: Report Error in Publication/Data
Opt Subject: confusables.txt and NFKC conflicts (31 entries)

31 entries in confusables.txt (UTS #39) map a source character to a different Latin letter or digit than NFKC normalization (UTR #15)produces for 
that same character.

For example, U+017F LATIN SMALL LETTER LONG S is mapped to "f" in confusables.txt, but NFKC normalization maps it to "s". Applications that run 
NFKC before confusable detection — as recommended by IDNA (UTS #46), ENS, and GitHub — will never reach these 31 confusable entries, and if stage 
order were reversed, the confusable mapping would produce incorrect results.

The full list of conflicts:

  U+017F  Long S                  → TR39: f, NFKC: s
  U+1CCDE Outlined Capital I      → TR39: l, NFKC: i
  U+2110  Script Capital I        → TR39: l, NFKC: i
  U+2111  Fraktur Capital I       → TR39: l, NFKC: i
  U+2160  Roman Numeral One       → TR39: l, NFKC: i
  U+FF29  Fullwidth Capital I     → TR39: l, NFKC: i
  U+1D408 Math Bold Capital I     → TR39: l, NFKC: i
  U+1D43C Math Italic Capital I   → TR39: l, NFKC: i
  U+1D470 Math Bold Italic Cap I  → TR39: l, NFKC: i
  U+1D4D8 Math Bold Script Cap I  → TR39: l, NFKC: i
  U+1D540 Math Double-Struck I    → TR39: l, NFKC: i
  U+1D574 Math Bold Fraktur I     → TR39: l, NFKC: i
  U+1D5A8 Math Sans-Serif I       → TR39: l, NFKC: i
  U+1D5DC Math Sans Bold I        → TR39: l, NFKC: i
  U+1D610 Math Sans Italic I      → TR39: l, NFKC: i
  U+1D644 Math Sans Bold Italic I → TR39: l, NFKC: i
  U+1D678 Math Monospace I        → TR39: l, NFKC: i
  U+1CCF0 Outlined Digit Zero     → TR39: o, NFKC: 0
  U+1D7CE Math Bold Digit Zero    → TR39: o, NFKC: 0
  U+1D7D8 Math Double-Struck 0    → TR39: o, NFKC: 0
  U+1D7E2 Math Sans-Serif 0       → TR39: o, NFKC: 0
  U+1D7EC Math Sans Bold 0        → TR39: o, NFKC: 0
  U+1D7F6 Math Monospace 0        → TR39: o, NFKC: 0
  U+1FBF0 Segmented Digit Zero    → TR39: o, NFKC: 0
  U+1CCF1 Outlined Digit One      → TR39: l, NFKC: 1
  U+1D7CF Math Bold Digit One     → TR39: l, NFKC: 1
  U+1D7D9 Math Double-Struck 1    → TR39: l, NFKC: 1
  U+1D7E3 Math Sans-Serif 1       → TR39: l, NFKC: 1
  U+1D7ED Math Sans Bold 1        → TR39: l, NFKC: 1
  U+1D7F7 Math Monospace 1        → TR39: l, NFKC: 1
  U+1FBF1 Segmented Digit One     → TR39: l, NFKC: 1

This is not a request to change the confusables.txt mappings, which are correct visual assessments. It is a request to document in UTS #39 that 
applications applying NFKC normalization before confusable detection should filter confusables.txt entries against NFKC to avoid dead code and 
potential incorrect mappings if pipeline order is changed.

Detailed writeup: https://paultendo.github.io/posts/unicode-confusables-nfkc-conflict/
Reproducing script: https://github.com/paultendo/namespace-guard/blob/main/scripts/generate-confusables.ts

Date/Time: Fri Feb 27 05:21:53 PT 2026
ReportID: ID20260227052153
Name: Yuya Hamada
Report Type: Report Error in Publication/Data
Opt Subject: There is no limit codepoint in grapheme cluster

Ref: https://unicode-org.atlassian.net/browse/ICU-23302

Grapheme cluster is no limit codepoint in grapheme cluster.

That means grapheme cluster can include many codepoints.
However, Computer resources is limited, so computer can crash(Denial of Service) when many codepoints but 1 grapheme cluster.

I think include sentence "grapheme cluster max codepoint is limited" in UAX#29.  

Max codepoint is suitable value is 32 because there is that sentence in UAX#15 -D3

UAX15-D3. Stream-Safe Text Format: A Unicode string is said to be in Stream-Safe Text Format if it would not contain any sequences 
of non-starters longer than 30 characters in length when normalized to NFKD.

https://unicode.org/reports/tr15/#UAX15-D3

Therefore, I suggest to max codepoint is 32 in UAX#29.

Regards

Date/Time: Tue Mar 17 21:44:21 PT 2026
ReportID: ID20260317214421
Name: David Corbett
Report Type: Report Error in Publication/Data
Opt Subject: L2/26-070 over-restricts shorthand format controls

Two of L2/26-070’s proposals for “Misplaced Default Ignorable Code Point” are too restrictive for Duployan.

Criterion 3 flags consecutive identical non-tag default ignorable code points. However, Duployan uses sequences of U+1BCA0 SHORTHAND FORMAT 
LETTER OVERLAP and/or U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP to represent multiple letters overlapping one letter. Two consecutive overlap 
controls can be identical.

Criterion 6 requires shorthand format controls to be adjacent to Duployan. However, there can be three or more overlap controls in a row, in which 
case the middle controls are not adjacent to Duployan. Also, a Duployan character with a common-script combining mark may precede a shorthand format 
control, in which case the control is not adjacent to the Duployan character. Also, non-Duployan base characters like U+003D EQUALS SIGN can 
participate in Duployan overlap sequences.

The solution is for shorthand format controls to not be default ignorable. This would address the security issue without breaking Duployan. 
Compare the Egyptian hieroglyph format controls, which are not default ignorable.

Date/Time: Mon Mar 23 14:23:05 PT 2026
ReportID: ID20260323142305
Name: Karl Williamson
Report Type: Report Error in Publication/Data
Opt Subject: Ambiguity in UTS 39

t says

Forbid sequences of the same nonspacing mark.

Forbid sequences of more than 4 nonspacing marks (gc=Mn or gc=Me).

I believe the traditional definition of nonspacing mark is gc=Mn. The

second line effectively redefines that definition to include enclosing

marks. Does that redefinition apply to the line above, or just to the

second line? It seems to me that sequences of the same enclosing mark

would be suspicious.



The document should be clarified, but in the meantime I I don't know the answer, and I'm writing code that depends on it.  So please tell me.

Feedback routed to Emoji Standard & Research Working Group for evaluation [ESR]

(None at this time.)

Feedback routed to Editorial Working Group for evaluation [EDC]

Date/Time: Sun Jan 04 11:57:58 PT 2026
ReportID: ID20260104115758
Name: Michel Mariani
Report Type: Report Error in Publication/Data
Opt Subject: Error in Core Spec - Tangut Components


• In: "Chapter 18" of the "Unicode Core Spec", in: "18.11.2 Tangut Components: U+18800–U+18AFF", under: "Repertoire", 
it is written: "In some cases, these single strokes are encoded as components (U+18900..U+18909, U+18D82..U+18D83)", 
but the first code point range is incorrect, "U+18900..U+18909" should be "U+18800..U+18809"

https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-18/#G43765

• See: "The Unicode Standard, Version 17.0 - CodeCharts.pdf"

Tangut Components

One-stroke components
18800 TANGUT COMPONENT-001
18801 TANGUT COMPONENT-002
18802 TANGUT COMPONENT-003
18803 TANGUT COMPONENT-004
18804 TANGUT COMPONENT-005
18805 TANGUT COMPONENT-006
18806 TANGUT COMPONENT-007
18807 TANGUT COMPONENT-008
18808 TANGUT COMPONENT-009
18809 TANGUT COMPONENT-010

Tangut Components Supplement

One-stroke components
18D82 TANGUT COMPONENT-771
18D83 TANGUT COMPONENT-772

Date/Time: Mon Mar 02 10:10:26 PT 2026
ReportID: ID20260302101026
Name: Night Koo
Report Type: Report Error in Publication/Data
Opt Subject: Core Spec Table 18-10 display error

Table 18-10 https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-18/#G25778 in Core Spec is incorrect. Table 18-9 shows the symbols 
directly on the left column, but Table 18-10 shows plain text; it should be changed to the characters directly.

Date/Time: Sat April 04 10:48:07 PT 2026
ReportID: ID20260404104807
Name: Zhongyu Chen
Report Type: Report Error in Publication/Data
Opt Subject: Grammatical Error at Section 2.8.1

There is an unnecessary "an" at Section 2.8.1, Planes. The incorrect text is as follows:

Additional Ideographic Planes. The Supplementary Ideographic Plane (SIP, or Plane 2) and Tertiary Ideographic Plane 
(TIP, or Plane 3) are intended as an additional allocation areas for those...

The correct text is as follows:

Additional Ideographic Planes. The Supplementary Ideographic Plane (SIP, or Plane 2) and Tertiary Ideographic Plane 
(TIP, or Plane 3) are intended as additional allocation areas for those...

Here's the link to the paragraph: https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-2/#G285877

Date/Time: Sun April 05 07:08:34 PT 2026
ReportID: ID20260405070834
Name: Zhongyu Chen
Report Type: Report Error in Publication/Data
Opt Subject: Incorrect description regarding combining marks

Section 2.11.1 of the Core Specification contains the following incorrect description:

Properties. A sequence of a base character plus one or more combining characters generally has the same properties as 
the base character. For example, “A” followed by “ˆ” has the same properties as “Â”.

"Â" is not the base character of the specified sequence. The correct description should be the following:

Properties. A sequence of a base character plus one or more combining characters generally has the same properties as 
the base character. For example, “A” followed by “ˆ” has the same properties as “A”.

Date/Time: Wed April 08 09:30:22 PT 2026
ReportID: ID20260408093022
Name: Zhongyu Chen
Report Type: Report Error in Publication/Data
Opt Subject: Character name alias isn't italicized

Section 2.13.2 of the Core Specification contains a paragraph that reads:

"Data streams (or files) that begin with a byte sequence for U+FEFF byte order mark for a given encoding form are likely to contain 
Unicode characters in that encoding form."

The name alias "byte order mark" should be italicized, according to Section A.1.2 of the same document. The corrected paragraph should be

"Data streams (or files) that begin with a byte sequence for U+FEFF byte order mark for a given encoding form are likely to contain 
Unicode characters in that encoding form."

Section 2.13.2: https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-2/#G9354

Section A.1.2: https://www.unicode.org/versions/Unicode17.0.0/core-spec/appendix-a/#G7091

Feedback routed to Charts Working Group for evaluation [CHARTS]

Date/Time: Thu Mar 05 09:04:42 PT 2026
ReportID: ID20260305090442
Name: Alfie Davies
Report Type: Report Error in Publication/Data
Opt Subject: Incorrect details for two symbols

At https://www.unicode.org/charts/PDF/U2980.pdf#page=5, there are pronunciation details given for the unicode symbols '⧾' (TINY 29FE) and '⧿' (MINY 29FF): 
it is stated that TINY is pronounced "teenie" and MINY is pronounced "meenie". This is not correct. The canonical reference is Winning Ways for 
you Mathematical Plays, by Berlekamp, Conway, and Guy. On page 125 in Volume 1 of the second edition, you will see that it says TINY should be 
pronounced "tiny". Similarly, on page 126, it specifies explicitly that MINY should be pronounced "miny".

In case it is relevant: I am a researcher in Combinatorial Game Theory. I'm somewhat curious to know where the "teenie" and "meenie" pronunciations 
came from initially here, but I don't know if you've retained that information. 

Thanks for all of your work in maintaining this!

Date/Time: Tue Mar 17 05:03:24 PT 2026
ReportID: ID20260317050324
Name: Philippe Verdy
Report Type: Report Error in Publication/Data
Opt Subject: Counting rod "tens" U+1D369-1D371

Not about any new characters, but I just noted that the chart for Counting Rods is still grouping the counting rod "tens" digits within the same 
section as counting rod "unit" digits:

https://www.unicode.org/charts/PDF/U1D360.pdf

There should be a separate section for "Counting rod tens" (1D369-1D371) in the chart, and in "namelist.txt" in the UCD.

Otherwise a merged section should be named "Counting rod digits", not "Counting rod units".

Other Reports

(None at this time.)