L2/12-072 To UTC From: Mark Davis Re: Proposed UCD property: Script Identifier Status. Date: 2012-02-04 In a growing number of places, we reference the data in Tables 4-7 of UAX #31. However, we force developers to scrape that data, instead of providing a machine-readable data file in the UCD. That is both clumsy and error-prone. I propose that we add a new provisional enumerated UCD property, tentatively called Script_Identifier_Status (sis). This property maps script codes to one of a set of values, which we would describe in UAX #31. The data reflects what is in UAX #31 already. Issues: 1. In the proposed data file below, I introduce a new value "Mixed" for 3 scripts that are not covered in UAX #31. I also give "Exclusion" to the 4th script (Unknown). 2. There are a few items in Table 4 that are described by properties other than script. Their inclusion was motivated by IDNA2008; they are more properly covered in UTS #36. They are: [[:Extender=True:]&[:Joining_Type=Join_Causing:]] [:Default_Ignorable_Code_Point:] [:block=Combining_Diacritical_Marks_for_Symbols:] [:block=Musical_Symbols:] [:block=Ancient_Greek_Musical_Notation:] [:block=Phaistos_Disc:] 3. Aside from the data file, related changes would include: 1) making modifications to UAX 31 to point to the data file 2) Like all new properties: a) adding the property and short name to PropertyAliases.txt b) adding the enum values (and abbreviated names) to PropertyValueAliases.txt c) updating UAX 44 ============================================= DRAFT DATA FILE ============================================= # ScriptIdentifierStatus.txt # Date: xxx # # Copyright (c) 1991-2011 Unicode, Inc. # For terms of use, see http://www.unicode.org/terms_of_use.html # # This file provides a recommended status for the use of different scripts # in identifiers. For more information, see # UAX #31: Unicode Identifier and Pattern Syntax # http://www.unicode.org/reports/tr31/ # # Each line contains 2 fields, separated by a semicolon. # # Field 0: The UCD script code. # # Field 1: The status value associated with the script code: # # Recommended (rec) # Mixed (mix) # Asperational (asp) # Limited_Use (lim) # Exclusion (exc) # # For a description of the meaning and usage of these values, # see UAX #31. Latn ; Recommended # Latin Hani ; Recommended # Han Cyrl ; Recommended # Cyrillic Hira ; Recommended # Hiragana Kana ; Recommended # Katakana Thai ; Recommended # Thai Arab ; Recommended # Arabic Hang ; Recommended # Hangul Deva ; Recommended # Devanagari Grek ; Recommended # Greek Hebr ; Recommended # Hebrew Taml ; Recommended # Tamil Knda ; Recommended # Kannada Geor ; Recommended # Georgian Mlym ; Recommended # Malayalam Telu ; Recommended # Telugu Armn ; Recommended # Armenian Mymr ; Recommended # Myanmar Gujr ; Recommended # Gujarati Beng ; Recommended # Bengali Guru ; Recommended # Gurmukhi Laoo ; Recommended # Lao Khmr ; Recommended # Khmer Tibt ; Recommended # Tibetan Sinh ; Recommended # Sinhala Ethi ; Recommended # Ethiopic Thaa ; Recommended # Thaana Orya ; Recommended # Oriya Bopo ; Recommended # Bopomofo Zyyy ; Mixed # Common Zinh ; Mixed # Inherited Brai ; Mixed # Braille Cans ; Asperational # Canadian_Aboriginal Yiii ; Asperational # Yi Mong ; Asperational # Mongolian Tfng ; Asperational # Tifinagh Plrd ; Asperational # Miao Syrc ; Limited_Use # Syriac Nkoo ; Limited_Use # Nko Cher ; Limited_Use # Cherokee Vaii ; Limited_Use # Vai Bali ; Limited_Use # Balinese Bamu ; Limited_Use # Bamum Batk ; Limited_Use # Batak Cham ; Limited_Use # Cham Java ; Limited_Use # Javanese Kali ; Limited_Use # Kayah_Li Lepc ; Limited_Use # Lepcha Limb ; Limited_Use # Limbu Lisu ; Limited_Use # Lisu Mand ; Limited_Use # Mandaic Mtei ; Limited_Use # Meetei_Mayek Talu ; Limited_Use # New_Tai_Lue Olck ; Limited_Use # Ol_Chiki Saur ; Limited_Use # Saurashtra Sund ; Limited_Use # Sundanese Sylo ; Limited_Use # Syloti_Nagri Tale ; Limited_Use # Tai_Le Lana ; Limited_Use # Tai_Tham Tavt ; Limited_Use # Tai_Viet Cakm ; Limited_Use # Chakma Zzzz ; Exclusion # Unknown Samr ; Exclusion # Samaritan Copt ; Exclusion # Coptic Glag ; Exclusion # Glagolitic Avst ; Exclusion # Avestan Brah ; Exclusion # Brahmi Bugi ; Exclusion # Buginese Buhd ; Exclusion # Buhid Cari ; Exclusion # Carian Xsux ; Exclusion # Cuneiform Cprt ; Exclusion # Cypriot Dsrt ; Exclusion # Deseret Egyp ; Exclusion # Egyptian_Hieroglyphs Goth ; Exclusion # Gothic Hano ; Exclusion # Hanunoo Armi ; Exclusion # Imperial_Aramaic Phli ; Exclusion # Inscriptional_Pahlavi Prti ; Exclusion # Inscriptional_Parthian Kthi ; Exclusion # Kaithi Khar ; Exclusion # Kharoshthi Linb ; Exclusion # Linear_B Lyci ; Exclusion # Lycian Lydi ; Exclusion # Lydian Ogam ; Exclusion # Ogham Ital ; Exclusion # Old_Italic Xpeo ; Exclusion # Old_Persian Sarb ; Exclusion # Old_South_Arabian Orkh ; Exclusion # Old_Turkic Osma ; Exclusion # Osmanya Phag ; Exclusion # Phags_Pa Phnx ; Exclusion # Phoenician Rjng ; Exclusion # Rejang Runr ; Exclusion # Runic Shaw ; Exclusion # Shavian Tglg ; Exclusion # Tagalog Tagb ; Exclusion # Tagbanwa Ugar ; Exclusion # Ugaritic Merc ; Exclusion # Meroitic_Cursive Mero ; Exclusion # Meroitic_Hieroglyphs Shrd ; Exclusion # Sharada Sora ; Exclusion # Sora_Sompeng Takr ; Exclusion # Takri