Accumulated Feedback on PRI #544

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

The links below go to locations in this document for feedback.

Feedback routed to CJK & Unihan Working Group for evaluation [CJK]
Feedback routed to Script Encoding Working Group for evaluation [SEW]
Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]
Feedback routed to Emoji Standard & Research Working Group for evaluation [ESR]
Feedback routed to Editorial Working Group for evaluation [EDC]
Feedback routed to Charts Working Group for evaluation [CHARTS]
Other Reports

 


Feedback routed to CJK & Unihan Working Group for evaluation [CJK]

(None at this time.)


Feedback routed to Script Encoding Working Group for evaluation [SEW]

Date/Time: Fri April 17 17:26:18 PT 2026
ReportID: ID20260417172618
Name: Jules Bertholet
Report Type: Report Error in Publication/Data
Opt Subject: Georgian_Extended has incorrect titlecase mappings


Letters in the Georgian script have two forms: the standard, lowercase-like "mkhedruli" form, and the 
variant uppercase-like "mtavruli" form. The former are encoded in the Georgian block (U+10A0..U+10FF), 
while the latter are in the Georgian Extended block (U+1C90..U+1CBF).

Unlike uppercase in other languages, however, mtavruli is only used in ALL CAPS contexts. It is not 
Used In Titlecase or. To start sentences. More information can be found in the original proposal for 
adding mtavruli to Unicode. Because of this behavior, that proposal very deliberately specified, and 
Unicode adopted, that lowercase/mkhedruli Georgian letters should titlecase to themselves, and not 
to the uppercase/mtavruli form. This ensures that applying the toTitlecase() transformation to a 
mkhedruli string does not incorrectly give a mixed-case result:

Uppercasing converts mkhedruli to mtavruli: toUppercase("ქართული ენა") = "ᲥᲐᲠᲗᲣᲚᲘ ᲔᲜᲐ"
Titlecasing leaves mkhedruli unchanged: toTitlecase("ქართული ენა") = "ქართული ენა"

However, Unicode unfortunately neglected to also make mtavruli letters titlecase to mkhedruli. 
Currently, these titlecase to themselves. When applying the titlecase tranformation to an entire 
mtavruli string, this results in an incorrect mixed-case result:

Titlecasing converts mtavruli into mixed case, which is not valid in modern Georgian usage: toTitlecase
("ᲥᲐᲠᲗᲣᲚᲘ ᲔᲜᲐ") = "Ქართული Ენა"

This error should be corrected by changing the Titlecase_Mapping of all assigned codepoints in the 
Georgian Extended block, setting it equal to the Lowercase_Mapping.


Feedback routed to Properties & Algorithms Working Group for evaluation [PAG]

(None at this time.)


Feedback routed to Emoji Standard & Research Working Group for evaluation [ESR]

(None at this time.)


Feedback routed to Editorial Working Group for evaluation [EDC]

Date/Time: Thu April 16 07:05:31 PT 2026
ReportID: ID20260416070531
Name: Sridatta A
Report Type: Report Error in Publication/Data
Opt Subject: Updates to Core Specification Chapter 6.1


In chapter 6.1 Writing Systems, under Abugidas
Chapter 6 – Unicode 17.0.0
The various encoding models for Abugidas/ Indic scripts are described.

"Because of legacy practice, three distinct approaches have been taken in the Unicode Standard for the 
encoding of abugidas: the Devanagari model, the Tibetan model, and the Thai model. The Devanagari model, 
used for most abugidas, represents text in primarily phonetic order and encodes a virama character that 
can combine with adjacent consonants to create conjunct forms. The Tibetan model also uses the primarily 
phonetic order, but its subjoined consonants are encoded directly rather than as virama-consonant sequences. 
The Thai model represents text in primarily visual display order, based on the typewriter legacy; neither 
Thai nor the other scripts using this model have conjunct forms."

However, many other recently encoded scripts too have a different model where Indic_Syllabic_Category=Pure_Killer and Indic_Syllabic_Category=
Invisible_Stacker are separated from Indic_Syllabic_Category=Virama.
Such as Myanmar, Tulu-Tigalari, Masaram Gondi etc.
In some these scripts characters with Indic_Syllabic_Category=Consonant_Preceding_Repha, Indic_Syllabic_Category=Consonant_Medial etc are 
also distinctly encoded instead of unifying with Virama.
The board level major types of the encoding models in Abugidas/Indic scripts can be added in the same paragraph or elsewhere too in the 
core specification if there is a similar description of the different encoding model of Abugidas.


Feedback routed to Charts Working Group for evaluation [CHARTS]

(None at this time.)


Other Reports

(None at this time.)