L2/23-280 Title: Request for additions to ScriptExtensions Author: Roozbeh Pournader Date: November 21, 2023 This proposal asks for various additions to ScriptExtensions based on research done by me during my script exemplars project (see L2/23-263). In that project, various characters were identified that we could provide an exhaustive (or sometimes almost exhaustive list) of scripts they are used with. This matches the quality of existing characters with a Script_Extentions property, and there is no reason to wait for adding them to ScriptExtentions.txt. If the UTC agrees with the addition of these characters, they can be part of a Unicode release as soon as Unicode 16.0. The data follows, with inline comments about where the evidence for the character being used in the script is provided: 204F ; Adlm Arab # Po REVERSED SEMICOLON # NamesList lists Sindhi (in Arabic script), # L2/14-219R says it's used in Adlam 2E30 ; Avst Orkh # Po RING POINT # NamesList lists Avestan, Core Spec lists Old Turkic 02C7 ; Bopo Latn # Lm CARON 02C9..02CB ; Bopo Latn # Lm [3] MODIFIER LETTER MACRON..MODIFIER LETTER GRAVE ACCENT 02D9 ; Bopo Latn # Sk DOT ABOVE # Latin is common usage, Core Spec lists Bopomofo. Note that these are # spacing marks. 0374 ; Copt Grek # Lm GREEK NUMERAL SIGN 0375 ; Copt Grek # Sk GREEK LOWER NUMERAL SIGN # Common in Greek, Core Spec lists Coptic 2E17 ; Copt Latn # Pd DOUBLE OBLIQUE HYPHEN # L2/03-338 shows examples for Latin and Coptic 030E ; Ethi Latn # Mn COMBINING DOUBLE VERTICAL LINE ABOVE # NamesList lists Latin, Core Spec lists Ethiopic 2FF0..2FFF ; Hani Tang # So [16] IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT..IDEOGRAPHIC DESCRIPTION CHARACTER ROTATION 31EF ; Hani Tang # So IDEOGRAPHIC DESCRIPTION CHARACTER SUBTRACTION # Core Spec mentions both Han and Tangut. TODO: Should Nushu also be added? 02CD ; Latn Lisu # Lm MODIFIER LETTER LOW MACRON # Latin is IPA usage, Core Spec lists Lisu 0358 ; Latn Osge # Mn COMBINING DOT ABOVE RIGHT # NamesList lists Latin, Core Spec lists Osage 030A ; Latn Syrc # Mn COMBINING RING ABOVE 0320 ; Latn Syrc # Mn COMBINING MINUS SIGN BELOW 0325 ; Latn Syrc # Mn COMBINING RING BELOW 032E ; Latn Syrc # Mn COMBINING BREVE BELOW # Latin is common, Core Spec lists Syriac 030D ; Latn Sunu # Mn COMBINING VERTICAL LINE ABOVE 0310 ; Latn Sunu # Mn COMBINING CANDRABINDU # Latin is common, Sunuwar usage is documented in L2/21-157R 02D7 ; Latn Thai # Sk MODIFIER LETTER MINUS SIGN # Latin is IPA usage, Core Spec lists Thai 0309 ; Latn Tfng # Mn COMBINING HOOK ABOVE # Latin is used in Vietnamese, Core Spec lists Tifinagh 2E41 ; Adlm Arab Hung # Po REVERSED COMMA # Adlam is per L2/14-219R, Arabic is for usage in Sindhi, # Core Spec lists Old Hungarian 0589 ; Armn Geor Glag # Po ARMENIAN FULL STOP # Armenian is per name and block, NamesList lists Georgian, # Glagolitic is per L2/99-012 035E ; Aghb Latn Todr # Mn COMBINING DOUBLE MACRON # Core Spec lists Caucasian Albanian, Latin is common usage, # Todhri is per L2/20-188R2 0324 ; Cher Latn Syrc # Mn COMBINING DIAERESIS BELOW 0330 ; Cher Latn Syrc # Mn COMBINING TILDE BELOW # Cherokee is per L2/14-064R, Latin is common, # Syriac is per Core Spec 030C ; Cher Latn Tale # Mn COMBINING CARON # Cherokee is per L2/14-064R, Latin is common, # Tai Le is documented in Core Spec Table 16-11 and L2/01-369 0311 ; Cyrl Latn Todr # Mn COMBINING INVERTED BREVE # Cyrillic and Latin is common, Todhri is per L2/20-188R2 10FB ; Geor Glag Latn # Po GEORGIAN PARAGRAPH SEPARATOR # Georgian and Latin are already listed in ScriptExtensions, # Glogolitic is per L2/99-012 202F ; Latn Mong Phag # Zs NARROW NO-BREAK SPACE # Latin and Mongolian are already listed by ScriptExtensions.txt, # Phags-pa is per Core Spec 032D ; Latn Sunu Syrc # Mn COMBINING CIRCUMFLEX ACCENT BELOW # Latin is common usage, Sunuwar is per L2/21-157R, Syriac is per Core Spec 030B ; Cher Cyrl Latn Osge # Mn COMBINING DOUBLE ACUTE ACCENT # Cherokee is per L2/14-064R, # Cyrillic and Latin is as used over precomposed letters, # Core Spec lists Osage 0302 ; Cher Cyrl Latn Tfng # Mn COMBINING CIRCUMFLEX ACCENT # Cherokee is per L2/14-064R, # Cyrillic and Latin is common, Core Spec lists Tifinagh 205D ; Cari Grek Hung Mero # Po TRICOLON # Carian is per Core Spec, character was encoded for Greek, # Old Hungarian is per Core Spec, Meroitic Hieroglyphs is per Core Spec 0306 ; Cyrl Grek Latn Perm # Mn COMBINING BREVE # Cyrillic, Greek, and Latin are common usage, Permic is per Core Spec 0323 ; Cher Kana Latn Syrc # Mn COMBINING DOT BELOW # Cherokee is per L2/14-064R, Katakana is per L2/20-209R, # Latin is common, Core Spec lists Syriac 0313 ; Grek Latn Perm Todr # Mn COMBINING COMMA ABOVE # Greek and Latin are common usage, Permic is per Core Spec, # Todhri is per L2/20-188R2 0303 ; Glag Latn Sunu Syrc Thai # Mn COMBINING TILDE # Glagolitic is per L2/99-012, Latin is common, Sunuwar is per L2/21-157R, # Syriac is per Core Spec, Thai is per Core Spec 0331 ; Aghb Cher Goth Latn Sunu Thai # Mn COMBINING MACRON BELOW # Caucasian Albanian is per Core Spec, Cherokee is per L2/14-064R, # Gothic is per Core Spec, Latin is common, Sunuwar is per L2/21-157R, # Thai is per Core Spec 0305 ; Copt Elba Glag Goth Kana Latn # Mn COMBINING OVERLINE # Coptic is per Core Spec, Elbasan is per Core Spec, # Glagolitic is per L2/99-012, Gothic is per Core Spec, # Katakana is per L2/20-209R, Latin is common 205A ; Cari Geor Glag Hung Lyci Orkh # Po TWO DOT PUNCTUATION # Carian is per Core Spec, Georgian is per Core Spec, # Old Hungarian is per Core Spec, Lycian is per Core Spec, # Old Turkic is per Core Spec 2E31 ; Avst Cari Geor Hung Kthi Lydi Samr # Po WORD SEPARATOR MIDDLE DOT # Avestan is per Core Spec and NamesList, Carian is per Core Spec, # Georgian is per Core Spec, Old Hungarian is per Core Spec, # Kaithi is per Core Spec, Lydian is per Core Spec, # Samaritan is per Core Spec and NamesList 02BC ; Beng Cyrl Deva Latn Lisu Thai Toto # Lm MODIFIER LETTER APOSTROPHE # Bengali is per Core Spec Section 12.2, Cyrillic is per Core Spec, # Devanagari is for Bodo, Dogri, and Maithili, Latin is common, # Lisu is per Core Spec, Thai is per Core Spec, Toto is per L2/19-330 3001 ; Bopo Hang Hani Hira Kana Mong Yiii # Po IDEOGRAPHIC COMMA # Mongolian is for Todo and Sibe per Core Spec, # others are already in ScriptExtensions 300A ; Bopo Hang Hani Hira Kana Lisu Mong Yiii # Ps LEFT DOUBLE ANGLE BRACKET 300B ; Bopo Hang Hani Hira Kana Lisu Mong Yiii # Pe RIGHT DOUBLE ANGLE BRACKET # Most of the scripts are already in ScriptExtensions, # Lisu is per Core Spec and L2/08-219, # Mongolian is for Todo and Sibe per Core Spec 3002 ; Bopo Hang Hani Hira Kana Mong Phag Yiii # Po IDEOGRAPHIC FULL STOP # Most of the scripts are already in ScriptExtensions, # Mongolian is for Todo and Sibe per Core Spec, # Phags-pa is per Core Spec 0300 ; Cher Copt Cyrl Grek Latn Perm Sunu Tale # Mn COMBINING GRAVE ACCENT # Cherokee is per L2/14-064R, Coptic is per Core Spec, # Cyrillic is used in character decompositions, # Greek is per Core Spec Table 7-2, Latin is common, # Permic is per Core Spec, Sunuwar is per L2/21-157R, # Tai Le is per Table 16-11 in Core Spec and L2/01-369 0301 ; Cher Cyrl Grek Latn Osge Sunu Tale Todr # Mn COMBINING ACUTE ACCENT # Cherokee is per L2/14-064R, Cyrillic is used in character decompositions, # Greek is per Core Spec Table 7-2, Latin is common, Osage is per Core Spec, # Sunuwar is per L2/21-157R, # Tai Le is per Table 16-11 in Core Spec and L2/01-369, # Todhri is per L2/20-188R2 0307 ; Copt Hebr Latn Perm Syrc Tale Tfng Todr # Mn COMBINING DOT ABOVE # Coptic is per Core Spec, Hebrew is per Core Spec, Latin is common, # Permic is per Core Spec, Syriac is per Core Spec, # Tai Le is per Table 16-11 in Core Spec and L2/01-369, # Tifinagh is per Core Spec, Todhri is per L2/20-188R2 0308 ; Armn Cyrl Goth Grek Hebr Latn Perm Syrc Tale # Mn COMBINING DIAERESIS # Armenian is per Core Spec, Cyrillic is used in decompositions, # Gothic is per Core Spec, Greek is per Core Spec Table 7-2, # Hebrew is per Core Spec, Latin is common, Permic is per Core Spec, # Syriac is per Core Spec, # Tai Le is per Table 16-11 in Core Spec and L2/01-369 0304 ; Aghb Cher Copt Cyrl Goth Grek Latn Osge Syrc Tfng Todr # Mn COMBINING MACRON # Caucasian Albanian is per Core Spec, Cherokee is per L2/14-064R, # Coptic is per Core Spec, # Cyrillic is used in decompositions, Gothic is per Core Spec, # Greek is per Core Spec, Latin is common, Osage is per Core Spec, # Syriac is per Core Spec, Tifinagh is per Core Spec, # Todhri is per L2/20-188R2 00B7 ; Avst Cari Copt Elba Geor Glag Gong Goth Grek Hani Latn Lydi Mahj Perm Shaw # Po MIDDLE DOT # Avestan is per Core Spec, Carian is per Core Spec, # Coptic is per Core Spec, Elbasan is per Core Spec, # Georgian is per Core Spec, Glagolitic is per Core Spec, # Gunjala Gondi is per Core Spec, Gothic is per Core Spec, # Greek is the canonical decomposition of U+0387 GREEK ANO TELEIA, # Han is CNS 11643 0x2131 per Core Spec, Latin is common, # Lydian is per Core Spec, Mahajani is per Core Spec, # Permic is per Core Spec, Shavian is per Core Spec