L2/01-396 From: Mark Davis (JTCSV) [mark.davis@jtcsv.com] Sent: Monday, October 22, 2001 5:52 PM Subject: UTC Agenda Item: Property Aliases We have internally developed aliases for all the UCD property names and property value names, which we use in Transliteration for context tests. Such aliases are also necessary for having XML formats for the UCD (a project from the last meeting). I have recently gotten a request from Perl people to have a standardized list of recommended names; right now you have to dig them out of HTML files, or they don't exist at all. Given this level of interest, I put together a draft proposal for a new file that would list a set of recommended names for properties and property values, and would like to have this on the agenda for the next meeting. The file is included below, and also attached in case the email messes up the line endings or the spacing. Some of the abbreviations in the ZZ items are probably not optimal -- comments welcome. Mark ============================ # DRAFT!! # PropertyAliases-3.2.0.txt # # This file contains aliases for properties and property values used in the UCD. # These names can be used for XML formats of UCD data, for regular-expression # property tests, and other programmatic textual descriptions of Unicode data. # The names are not normative, except where they correspond to normative values # in the UCD. # # The names may be translated in appropriate environments, and additional # aliases may be useful. # # FORMAT # Each line has three fields. Where the first field is AA, BB, or ZZ, then # the line describes a property name. # AA - non-enumerated properties # BB - enumerated, non-binary properties # ZZ - binary properties # # (The values AA, BB, and ZZ are arbitrary -- they were simply chosen to distinguish # the different types.) # # Where the first field is not one of the above, the line describes a # property value name. The first field describes the property for which that # property value name is used. There are two special properties: # # xx stands for any binary property # qc stands for any quick-check property # # With loose matching of property names, case distinctions, whitespace, # and '_' are ignored. # # NOTE: the property value names are NOT unique across properties, especially # with loose matches. For example, # AL means Arabic Letter for the Bidi_Class property, and # AL means Alpha_Left for the Combining_Class property, and # AL means Alphabetic for the Line_Break property. # # In addition, some property names may be the same as some property value names: # cc means Combining_Class property, and # cc means the General_Category property value Control (cc) # # The combination of property value and property name is, however, unique. # For more information, see UTR #24: Regular Expression Guidelines # ================================================ AA; bmg ; Bidi_Mirroring_Glyph AA; cf ; Case_Folding AA; dm ; Decomposition_Mapping AA; lc ; Lowercase_Mapping AA; na ; Name AA; nv ; Numeric_Value AA; scc ; Special_Case_Condition AA; sfc ; Simple_Case_Folding AA; slc ; Simple_Lowercase_Mapping AA; stc ; Simple_Titlecase_Mapping AA; suc ; Simple_Uppercase_Mapping AA; tc ; Titlecase_Mapping AA; uc ; Uppercase_Mapping BB; bc ; BidiClass BB; cc ; CombiningClass BB; dt ; DecompositionType BB; ea ; EastAsianWidth BB; gc ; GeneralCategory BB; jg ; JoiningGroup BB; jt ; JoiningType BB; lb ; LineBreak BB; nt ; NumericType BB; sc ; Script bc; AL ; Arabic_Letter bc; AN ; Arabic_Number bc; B ; Paragraph_Separator bc; BN ; Boundary_Neutral bc; CS ; Common_Separator bc; EN ; European_Number bc; ES ; European_Separator bc; ET ; European_Terminator bc; L ; Left_To_Right bc; LRE ; Left_To_Right_Embedding bc; LRO ; Left_To_Right_Override bc; NSM ; Nonspacing_Mark bc; ON ; Other_Neutral bc; PDF ; Pop_Directional_Format bc; R ; Right_To_Left bc; RLE ; Right_To_Left_Embedding bc; RLO ; Right_To_Left_Override bc; S ; Segment_Separator bc; WS ; White_Space cc; A ; Above cc; AL ; Above_Left cc; AR ; Above_Right cc; ATA ; Attached_Above cc; ATAL ; Attached_Above_Left cc; ATAR ; Attached_Above_Right cc; ATB ; Attached_Below cc; ATBL ; Attached_Below_Left cc; ATBR ; Attached_Below_Right cc; ATL ; Attached_Left cc; ATR ; Attached_Right cc; B ; Below cc; BL ; Below_Left cc; BR ; Below_Right cc; DB ; Double_Above cc; DB ; Double_Below cc; IS ; Iota_Subscript cc; KV ; Kana_Voicing cc; L ; Left cc; NK ; Nukta cc; NR ; Not_Reordered cc; OV ; Overlay cc; R ; Right cc; VR ; Virama dt; ca ; canonical dt; ci ; circle dt; co ; compat dt; fi ; final dt; fo ; font dt; fr ; fraction dt; in ; initial dt; is ; isolated dt; me ; medial dt; na ; narrow dt; nb ; no_Break dt; no ; none dt; sb ; sub dt; sm ; small dt; sp ; super dt; sq ; square dt; ve ; vertical dt; wi ; wide ea; A ; Ambiguous ea; F ; Fullwidth ea; H ; Halfwidth ea; N ; Neutral ea; Na ; Narrow ea; W ; Wide gc; Cc ; Control gc; Cf ; Format gc; Cn ; Unassigned gc; Co ; Private_Use gc; Cs ; Surrogate gc; Ll ; Lowercase_Letter gc; Lm ; Modifier_Letter gc; Lo ; Other_Letter gc; Lt ; Titlecase_Letter gc; Lu ; Uppercase_Letter gc; Mc ; Spacing_Mark gc; Me ; Enclosing_Mark gc; Mn ; Nonspacing_Mark gc; Nd ; Decimal_Number gc; Nl ; Letter_Number gc; No ; Other_Number gc; Pc ; Connector_Punctuation gc; Pd ; Dash_Punctuation gc; Pe ; Close_Punctuation gc; Pf ; Final_Punctuation gc; Pi ; Initial_Punctuation gc; Po ; Other_Punctuation gc; Ps ; Open_Punctuation gc; Sc ; Currency_Symbol gc; Sk ; Modifier_Symbol gc; Sm ; Math_Symbol gc; So ; Other_Symbol gc; Zl ; Line_Separator gc; Zp ; Paragraph_Separator gc; Zs ; Space_Separator jg; AIN ; AIN jg; ALAPH ; ALAPH jg; ALEF ; ALEF jg; BEH ; BEH jg; BETH ; BETH jg; DAL ; DAL jg; DALATH_RISH; DALATH_RISH jg; E ; E jg; FEH ; FEH jg; FINAL_SEMKATH; FINAL_SEMKATH jg; GAF ; GAF jg; GAMAL ; GAMAL jg; HAH ; HAH jg; HAMZA_ON_HEH_GOAL; HAMZA_ON_HEH_GOAL jg; HE ; HE jg; HEH_GOAL ; HEH_GOAL jg; HEH ; HEH jg; HETH ; HETH jg; KAF ; KAF jg; KAPH ; KAPH jg; KNOTTED_HEH; KNOTTED_HEH jg; LAM ; LAM jg; LAMADH ; LAMADH jg; MEEM ; MEEM jg; MIM ; MIM jg; NO_JOINING_GROUP; NO_JOINING_GROUP jg; NOON ; NOON jg; NUN ; NUN jg; PE ; PE jg; QAF ; QAF jg; QAPH ; QAPH jg; REH ; REH jg; REVERSED_PE; REVERSED_PE jg; SAD ; SAD jg; SADHE ; SADHE jg; SEEN ; SEEN jg; SEMKATH ; SEMKATH jg; SHIN ; SHIN jg; SWASH_KAF ; SWASH_KAF jg; TAH ; TAH jg; TAW ; TAW jg; TEH_MARBUTA; TEH_MARBUTA jg; TETH ; TETH jg; WAW ; WAW jg; YEH_BARREE; YEH_BARREE jg; YEH_WITH_TAIL; YEH_WITH_TAIL jg; YEH ; YEH jg; YUDH_HE ; YUDH_HE jg; YUDH ; YUDH jg; ZAIN ; ZAIN jt; C ; Join_Causing jt; D ; Dual_Joining jt; L ; Left_Joining jt; R ; Right_Joining jt; T ; Transparent jt; U ; Non_Joining lb; AI ; Ambiguous lb; AL ; Alphabetic lb; B2 ; Break_Both lb; BA ; Break_After lb; BB ; Break_Before lb; BK ; Mandatory_Break lb; CB ; Contingent_Break lb; CL ; Close_Punctuation lb; CM ; Combining_Mark lb; CR ; Carriage_Return lb; EX ; Exclamation lb; GL ; Glue lb; HY ; Hyphen lb; ID ; Ideographic lb; IN ; Inseperable lb; IS ; Infix_Numeric lb; LF ; Line_Feed lb; NS ; Nonstarter lb; NU ; Numeric lb; OP ; Open_Punctuation lb; PO ; Postfix_Numeric lb; PR ; Prefix_Numeric lb; QU ; Quotation lb; SA ; Complex_Context lb; SG ; Surrogate lb; SP ; Space lb; SY ; Break_Symbols lb; XX ; Unknown lb; ZW ; ZWSpace nt; de ; decimal nt; di ; digit nt; no ; none nt; nu ; numeric qc; M ; Maybe qc; N ; No qc; Y ; Yes sc; Arab ; Arabic sc; Armn ; Armenian sc; Beng ; Bengali sc; Bopo ; Bopomofo sc; Cans ; Canadian_Aboriginal sc; Cher ; Cherokee sc; Cyrl ; Cyrillic sc; Deva ; Devanagari sc; Dsrt ; Deseret sc; Ethi ; Ethiopic sc; Geor ; Georgian sc; Goth ; Gothic sc; Grek ; Greek sc; Gujr ; Gujarati sc; Guru ; Gurmukhi sc; Hang ; Hangul sc; Hani ; Han sc; Hebr ; Hebrew sc; Hira ; Hiragana sc; Ital ; Old_Italic sc; Kana ; Katakana sc; Khmr ; Khmer sc; Knda ; Kannada sc; Laoo ; Lao sc; Latn ; Latin sc; Mlym ; Malayalam sc; Mong ; Mongolian sc; Mymr ; Myanmar sc; Ogam ; Ogham sc; Orya ; Oriya sc; Qaai ; Inherited sc; Runr ; Runic sc; Sinh ; Sinhala sc; Syrc ; Syriac sc; Taml ; Tamil sc; Telu ; Telugu sc; Thaa ; Thaana sc; Thai ; Thai sc; Tibt ; Tibetan sc; Yiii ; Yi sc; Zyyy ; Common xx; F ; False xx; T ; True ZZ; AHex ; ASCII_Hex_Digit ZZ; Alpha ; Alphabetic ZZ; BidiC ; Bidi_Control ZZ; BidiM ; Bidi_Mirrored ZZ; CE ; Composition_Exclusion ZZ; CI ; Case_Ignorable ZZ; Comp_Ex ; Full_Composition_Exclusion ZZ; Dash ; Dash ZZ; Dep ; Deprecated ZZ; DI ; Default_Ignorable_Code_Point ZZ; Dia ; Diacritic ZZ; Ext ; Extender ZZ; FC_NFC ; FC_NFC_Closure ZZ; FC_NFKC ; FC_NFKC_Closure ZZ; GrBase ; Grapheme_Base ZZ; GrExt ; Grapheme_Extend ZZ; GrLink ; Grapheme_Link ZZ; Hex ; Hex_Digit ZZ; Hyphen ; Hyphen ZZ; IDC ; ID_Continue ZZ; Ideo ; Ideographic ZZ; IDS ; ID_Start ZZ; IDSB ; IDS_Binary_Operator ZZ; IDST ; IDS_Trinary_Operator ZZ; JoinC ; Join_Control ZZ; Lower ; Lowercase ZZ; Math ; Math ZZ; NBrk ; Non_Break ZZ; NChar ; Noncharacter_Code_Point ZZ; NFC_QC ; NFC_Quick_Check ZZ; NFD_QC ; NFD_Quick_Check ZZ; NFKC_QC ; NFKC_Quick_Check ZZ; NFKD_QC ; NFKD_Quick_Check ZZ; OAlpha ; Other_Alphabetic ZZ; OCI ; Other_Case_Ignorable ZZ; ODI ; Other_Default_Ignorable_Code_Point ZZ; OGrExt ; Other_Grapheme_Extend ZZ; OLower ; Other_Lowercase ZZ; OMath ; Other_Math ZZ; OUpper ; Other_Uppercase ZZ; QMark ; Quotation_Mark ZZ; Radical ; Radical ZZ; SDot ; Special_Dotted ZZ; Term ; Terminal_Punctuation ZZ; UIdeo ; Unified_Ideograph ZZ; Upper ; Uppercase ZZ; WSpace ; White_Space ZZ; XIDC ; XID_Continue ZZ; XIDS ; XID_Start ZZ; XO_NFC ; Expands_On_NFC ZZ; XO_NFD ; Expands_On_NFD ZZ; XO_NFKC ; Expands_On_NFKC ZZ; XO_NFKD ; Expands_On_NFKD 8