L2/19-325

Comments on Public Review Issues
(July 23 - October 6, 2019)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of October 6, 2019, since the previous cumulative document was issued prior to UTC #160 (July 2019).

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of October 6, 2019.

Issue Name Feedback Link
407 Proposed Update UAX #31 Unicode Identifier and Pattern Syntax (feedback) No feedback at this time
406 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback) No feedback at this time
405 Proposed Update UTS #51, Unicode Emoji (feedback)
404 Proposed Update UTS #18, Unicode Regular Expressions (feedback)
403 Proposed Update UAX #41, Common References for Unicode Standard Annexes (feedback) No feedback at this time
402 Proposed Update UAX #50, Unicode Vertical Text Layout (feedback)
401 Proposed Update UAX #24, Unicode Script Property (feedback) No feedback at this time
400 Proposed Update UAX #38, Unicode Han Database (Unihan) (feedback)
399 Proposed Update UAX #45, U-source Ideographs (feedback)
398 Proposed Update UAX #44, Unicode Character Database (feedback)
396 Proposed Update UAX #29, Unicode Text Segmentation (feedback)
395 Proposed Update UAX #15, Unicode Normalization Forms (feedback)

The links below go to locations in this document for feedback.

Feedback to UTC / Encoding Proposals
Feedback on UTRs / UAXes
Error Reports
Other Reports

 


Feedback to UTC / Encoding Proposals

(None at this time.)


Feedback on UTRs / UAXes

(None at this time.)


Error Reports

Date/Time: Tue Oct 1 19:16:39 CDT 2019
Name: Geva Patz
Report Type: Error Report
Opt Subject: Possible error in description of U+2101

Unicode code point U+2101, '℁', introduced in version 4.0 of the standard in
the 'lettterlike symbols' block, bears the description 'addressed to the
subject'. I can find no evidence of the use of such an abbreviation in
English. However, in French, the phrase 'aux (bons) soins de', equivalent to
the English '(in) care of', is abbreviated 'a/s de', or sometimes just
'a/s', equivalently to the English 'c/o'. 

Given that this symbol exists in the same code block as code point U+2105,
'℅' – 'care of'. I believe that the description in the standard may be
erroneous and should be updated to 'aux soins de' to reflect the actual
usage of the abbreviation.

Date/Time: Tue Oct 1 20:12:20 CDT 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Unclear definition of Alphabetic for marks

The description of Alphabetic in chapter 4 does not make clear under what
circumstances a combining character should have Alphabetic=Yes. Vowel signs
and other such marks that represent their own sounds are Alphabetic, but the
rest is unclear. Some combining versions of Alphabetic bases are Alphabetic
even if they don’t represent their own sounds. It is very inconsistent
though. For example, U+1DEA COMBINING LATIN SMALL LETTER SCHWA does not
represent its own sound but is used with a base vowel letter to represent an
intermediate vowel sound, yet it is Alphabetic; U+1DDC COMBINING LATIN SMALL
LETTER K is basically its own letter that is written combining to denote an
abbreviation or just to save space, yet is is not Alphabetic.

Date/Time: Tue Oct 1 20:14:01 CDT 2019
Name: David Corbett
Report Type: Error Report
Opt Subject: Inconsistencies in Diacritic

The following pairs of sets are inconsistent about whether Diacritic 
is Yes or No and should be reviewed.

• U+0300 COMBINING GRAVE ACCENT and U+0301 COMBINING ACUTE ACCENT vs. U+1DC0 COMBINING DOTTED GRAVE ACCENT and U+1DC1 COMBINING DOTTED ACUTE ACCENT
• U+035D COMBINING DOUBLE BREVE and U+0361 COMBINING DOUBLE INVERTED BREVE vs. U+035C COMBINING DOUBLE BREVE BELOW and U+1DFC COMBINING DOUBLE INVERTED BREVE BELOW
• U+05A2 HEBREW ACCENT ATNAH HAFUKH vs. all other Hebrew accents
• U+082C SAMARITAN VOWEL SIGN SUKUN, U+0AFA GUJARATI SIGN SUKUN, and U+1123E KHOJKI SIGN SUKUN vs. U+0652 ARABIC SUKUN and U+07B0 THAANA SUKUN
• U+0AFB GUJARATI SIGN SHADDA and U+11237 KHOJKI SIGN SHADDA vs. U+0651 ARABIC SHADDA
• U+1ABB COMBINING PARENTHESES ABOVE vs. U+1ABE COMBINING PARENTHESES OVERLAY
• U+1BE6 BATAK SIGN TOMPI and U+1133B COMBINING BINDU BELOW vs. all other nuktas
• U+1DCA COMBINING LATIN SMALL LETTER R BELOW and Devanagari and Grantha combining letters vs. other combining Latin letters, and Cyrillic, Glagolitic, and Old Permic combining letters
• U+1DF8 COMBINING DOT ABOVE LEFT vs. U+0358 COMBINING DOT ABOVE RIGHT
• U+A8F1 COMBINING DEVANAGARI SIGN AVAGRAHA vs. combining avagrahas called SANDHI MARKs
• [:ccc=9:] in some scripts vs. [:ccc=9:] in other scripts

Additionally, the following are clearly “Characters that linguistically
modify the meaning of another character to which they apply” and so should
have Diacritic=Yes:

• U+035A COMBINING DOUBLE RING BELOW
• the Mandaic diacritics U+0859 through U+085B
• the Kharoshthi diacritics U+10A38 through U+10A3A
• U+11A33 ZANABAZAR SQUARE FINAL CONSONANT MARK
• U+1BC9D DUPLOYAN THICK LETTER SELECTOR
• SignWriting head shapes

Other Reports

Date/Time: Mon Jul 22 03:47:50 CDT 2019
Name: Denis Moyogo Jacquerye
Report Type: Error Report
Opt Subject: Preferred rendering of U+0162, U+0163


In The Unicode Standard version 12.0, Table 7-1 entitled "Preferred
Rendering of Cedilla versus Comma Below" on page 290 shows the letters c, e,
h, s in the column for letters that bear a cedilla that should look like a
traditional cedilla and the letter d, g, k, l, n, r, t in the column for
letters that bear a cedilla that should look like a comma below.

The t should not be in the comma below column but in the cedilla column. The
glyphs for the characters U+0162 and U+0163 already has a cedila in the
Unicode charts.

The characters 
U+015E LATIN CAPITAL LETTER S WITH CEDILLA, 
U+015F LATIN SMALL LETTER S WITH CEDILLA, 
U+0162 LATIN CAPITAL LETTER T WITH CEDILLA, 
U+0163 LATIN SMALL LETTER T WITH CEDILLA 
were to be used in Romanian before the addition of
U+0218 LATIN CAPITAL LETTER S WITH COMMA BELOW, 
U+0219 LATIN SMALL LETTER S WITH COMMA BELOW, 
U+021A LATIN CAPITAL LETTER T WITH COMMA BELOW, 
U+021B LATIN SMALL LETTER T WITH COMMA BELOW.

Concerning Romanian, there is no reason why s and t should not be in the
same column in Table 7-1.

T with cedilla is used with a cedilla in the Gagauze language orthography
along with c cedilla and s cedilla, an orthography officialized by the
Moldovan governement in Hotărîrea Parlamentului Nr. 1421 din 13-05-1993,
pentru trecerea scrisului limbii găgăuze la grafia latină and Hotărîrea
Parlamentului Nr. Nr. 816 din 24-04-1996, privind modificarea şi completarea
Hotărîrii Parlamentului pentru trecerea scrisului limbii găgăuze la grafia
latină.

T with cedilla is used with a cedilla in the Manjaku language orthography
and the Makanya language orthography along with s cedilla, two orthographies
officialized by the Senegalese government in Décret no 2005-983 du 21
octobre 2005 relatif à l’orthographe et à la séparation des mots en manjakú
and Décret n° 2005-984 du 21 octobre 2005 relatif à l'orthographe et à la
séparation des mots en mankaañ.

T with cedilla has been used with a cedilla in the UN recommended
romanization for Arabic  originaly created in 1972:
http://www.eki.ee/wgrs/obs_rom_vers/rom1_ar_v4_0.htm. Note that this
romanization used other letters with cedilla and has been replaced by a 2017
romanization which doesn’t use cedillas anymore.

Date/Time: Thu Aug 29 13:30:11 CDT 2019
Name: Shmuel (Seymour J.) Metz
Report Type: Error Report
Opt Subject: There should be a warning about inserting a BOM

Section 23.8 Special of
http://www.unicode.org/versions/Unicode12.0.0/UnicodeStandard-12.0.pdf 
should warn against inserting a byte order mark at the beginning of a file
unless the application reading the file is known to accept it. Note that
there is a warning in the FAQ.