Public Review Issues

Accumulated Feedback on PRI #514

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Wed Feb 05 20:22:05 CST 2025
ReportID: ID20250205202205
Name: Shieru Asakoto
Report Type: Public Review Issue
Opt Subject: 514 [CJK]


I found several errors in the draft USourceData.txt for UAX #45, listed as follows:

1. Field 8 & 9 for UTC-03571~03581 are reversed.
2. The status of UTC-00588 is incorrect: should be 'WS-2021' instead of 'Rejected' because ⿰米戀 is found in WS-2021 as WS2021-03048, 
even though U-source is not recorded.

Date/Time: Wed Feb 05 22:51:46 CST 2025
ReportID: ID20250205225146
Name: kirk miller
Report Type: Public Review Issue
Opt Subject: 514 [EDC]


1ACF is listed under the header 'Tone mark used in IPA'. It could just as easily be merged under the following header, 'Compound tone diacritics', 
because like those it is a compound tone diacritic. No need IMO for two headers; the 2nd works for both. 

1AD0..1AD0 - the hyphenation is odd in the names. E.g. COMBINING VERTICAL-LINE-ACUTE looks like a combination of three elements [vertical + 
line + acute] when it's a combination of two [vertical line + acute]. But probably not bothering with at this point.

1AE2 ᫢ COMBINING MINUS SIGN ABOVE - might want to add a disambiguating annotation, 
      → 0304 combining macron. 

1AE5 ᫥ COMBINING SEAGULL ABOVE - might want to add a disambiguating annotation, 
      → 1AE7 combining double arch above. 

1AE7 ᫧ COMBINING DOUBLE ARCH ABOVE - might want to add a disambiguating annotation, 
      → 1AE5 combining seagull above. 

A7F1 has the note 'also used as a phonetic and phonemic wildcard
character'. That note applies equally to all characters under the heading 'Modifier letters for Chatino (México)', and so would be better 
placed under the header.

Date/Time: Thu Feb 06 12:35:37 CST 2025
ReportID: ID20250206123537
Name: Bryndan Meyerholt
Report Type: Public Review Issue
Opt Subject: 514 [EDC]


The third Honorific word ligatures should go above U+FD90 instead of U+FD50

Date/Time: Thu Feb 20 05:26:01 CST 2025
ReportID: ID20250220052601
Name: Andrew West
Report Type: Public Review Issue
Opt Subject: 514 [Charts]


The glyph for U+A7F1 MODIFIER LETTER CAPITAL S in the Latin Extend-D block is not in the same serif font style as all the other modifier 
capital letters. The current glyph should be replaced with one that is a smaller version of the code chart glyph for U+0053 LATIN CAPITAL 
LETTER S.

Date/Time: Thu Feb 20 07:12:29 CST 2025
ReportID: ID20250220071229
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 514 [PAG]


U+11DD9 TOLONG SIKI SIGN SELA currently has General_Category=Modifier_Letter while U+11DDA TOLONG SIKI SIGN HECAKA has General_Category=Other_Letter, 
which may be unintentional considering that these two signs are very similar in nature. In the original proposal (L2/23-024) they are both categorised 
as Other_Letter, but I think that Modifier_Letter is probably the more appropriate property value for both because they are not “proper” letters of 
the alphabet.

Date/Time: Sat Feb 22 05:14:03 CST 2025
ReportID: ID20250222051403
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 514 [PAG]


The five combining marks of the Tai Yo script currently have Canonical_Combining_Class=Above (230):

	U+1E6E3 TAI YO SIGN UE
	U+1E6E6 TAI YO SIGN AU
	U+1E6EE TAI YO SIGN AY
	U+1E6EF TAI YO SIGN ANG
	U+1E6F5 TAI YO SIGN OM

However, these signs sit above their base only when Tai Yo text is turned on its side to fit into a horizontal layout; when written vertically 
like normal they sit on the right of their base. As such, they should be changed to CCC=Right (226).

The model case for this is U+18A9 MONGOLIAN LETTER ALI GALI DAGALGA which has CCC=Above_Left (228) based on its position in vertical layouts 
only. When Mongolian is written horizontally, U+18A9 sits *below* left instead.

Another option would be to change the Tai Yo signs to CCC=0 since they’re all positioned on the same side anyway, but this might cause problems 
if the tone marks mentioned in section 4.3 of L2/22-289r, which sit to the left of their base, are ever encoded in the future and users have to 
guess whether to enter the vowel/final or the tone mark first; both orders would look identical since the two types of marks don’t interact 
typographically but would not be canonically equivalent. I therefore recommend against this approach.

Date/Time: Mon Feb 24 05:47:44 CST 2025
ReportID: ID20250224054744
Name: Charlotte Buff
Report Type: Public Review Issue
Opt Subject: 514 [PAG]


Regarding my previous feedback on the CCC values of Tai Yo combining marks: I have discovered that the question of whether to assign positional 
combining classes based on vertical orientation isn’t as straightforward, as the Old Uyghur script is also predominantly written vertically, but 
its CCC values reflect the position of marks in a horizontal layout. I still think that the CCC values I proposed for Tai Yo, i.e. following the 
Mongolian model, are the better option because Old Uyghur is not a living script. Old Uyghur text will almost always be found embedded in horizontal 
layouts and therefore flipped on its side, which makes the CCC assignments based on horizontal orientation reasonable. The same is not true for Tai 
Yo which has an active user community expecting the characters’ properties to reflect their most common usage.

Date/Time: Thu Mar 13 16:46:03 CDT 2025
ReportID: ID20250313164603
Name: Andrew West
Report Type: Public Review Issue
Opt Subject: 514 [SEW]


The Radical/Stroke value for U+17D0B in the code chart for Tangut and in TangutSources.txt is given as 198.6. However, the glyph for 
U+17D0B has had an extra stroke added for 17.0 (L2/23-148), so the Radical/Stroke value should actually be 198.7.

Date/Time: Tue Mar 18 00:00:03 CDT 2025
ReportID: ID20250318000003
Name: Charles Lew
Report Type: Public Review Issue
Opt Subject: 514 [CJK]


I observed EquivalentUnifiedIdeograph.txt maps 2E81 to 5382, but I think U+20086 might be a equivalent good choice, if not better. 
I wonder if there's any plan to update EquivalentUnifiedIdeograph.txt to reflect this, or have some Unihan attribute to reflect 
this somehow? The goal is make this sort of information machine-accessible. Thanks!

Date/Time: Tue Mar 25 18:39:05 CDT 2025
ReportID: ID20250325183905
Name: Andrew West
Report Type: Public Review Issue
Opt Subject: 514 [SEW]


Four Tangut ideographs added to the Tangut and Tangut Supplement blocks in Unicode 17.0 have incorrect radical values:

U+187F9 has a radical/stroke value of 267.9, but it should be 766.9 as the left side is Component 766 (U+18AFD)
U+18D0A has a radical/stroke value of 267.18, but it should be 766.18 as the left side is Component 766 (U+18AFD)
U+18D0B has a radical/stroke value of 267.10, but it should be 766.10 as the left side is Component 766 (U+18AFD)
U+18D0F has a radical/stroke value of 278.11, but it should be 767.11 as the left side is Component 767 (U+18AFE)

TangutSources.txt should be updated as below for these four characters:

U+187F9	kRSTUnicode	267.9 => 766.9
U+18D0A	kRSTUnicode	267.18 => 766.18
U+18D0B	kRSTUnicode	267.10 => 766.10
U+18D0F	kRSTUnicode	278.11 => 767.11

Date/Time: Tue Mar 25 19:05:24 CDT 2025
ReportID: ID20250325190524
Name: Andrew West
Report Type: Public Review Issue
Opt Subject: 514 [SEW]


On 13 March 2025 I reported an error with the radical/stroke value of U+17D0B. However, I incorrectly requested that 
it should be changed from 198.6 to 198.7. Actually the new radical/stroke value should be 783.7 as the left side of 
the character has an additional dot stroke, so is now Tangut Component-783 (U+18D8E) which is being added in Unicode 17.0.

Date/Time: Thu Mar 27 12:37:38 CDT 2025
ReportID: ID20250327123738
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: 514 [EDC]


In section 4.8 of the core spec (https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-4/#G141423), it says that 
character name aliases include "alternate names" (as one type of alias in Table 4-7), and also says, "Character name aliases 
are listed in the file NameAliases.txt in the Unicode Character Database." However, clearly not all alternate names are 
listed in that file; in fact, only one is listed (only one entry has type "alternate": FEFF. So, it appears that the text in 
4.8 is not consistent with actual practice in UCD.

The text also says,
"Character name aliases are immutable, once published. ... They follow the same syntax rules as character names and are also 
guaranteed to be unique in the Unicode namespace for character names."

But clearly alternate names are not unique. For example, 1B10, 1B3F and A9BB all have an alternate name "ai".

I recommend that text in 4.8 be updated to clarify how alternate names are handled in data files and in relation to the name 
conventions and stability policies.