L2/01-396

From: Mark Davis (JTCSV) [mark.davis@jtcsv.com]
Sent: Monday, October 22, 2001 5:52 PM

Subject: UTC Agenda Item: Property Aliases

We have internally developed aliases for all the UCD property names and
property value names, which we use in Transliteration for context tests.
Such aliases are also necessary for having XML formats for the UCD (a
project from the last meeting).

I have recently gotten a request from Perl people to have a standardized
list of recommended names; right now you have to dig them out of HTML files,
or they don't exist at all. Given this level of interest, I put together a
draft proposal for a new file that would list a set of recommended names for
properties and property values, and would like to have this on the agenda
for the next meeting.

The file is included below, and also attached in case the email messes up
the line endings or the spacing. Some of the abbreviations in the ZZ items
are probably not optimal -- comments welcome.

Mark

============================


# DRAFT!!
# PropertyAliases-3.2.0.txt
#
# This file contains aliases for properties and property values used in the
UCD.
# These names can be used for XML formats of UCD data, for
regular-expression
# property tests, and other programmatic textual descriptions of Unicode
data.
# The names are not normative, except where they correspond to normative
values
# in the UCD.
#
# The names may be translated in appropriate environments, and additional
# aliases may be useful.
#
# FORMAT
# Each line has three fields. Where the first field is AA, BB, or ZZ, then
# the line describes a property name.
# AA - non-enumerated properties
# BB - enumerated, non-binary properties
# ZZ - binary properties
#
# (The values AA, BB, and ZZ are arbitrary -- they were simply chosen to
distinguish
# the different types.)
#
# Where the first field is not one of the above, the line describes a
# property value name. The first field describes the property for which that
# property value name is used. There are two special properties:
#
# xx stands for any binary property
# qc stands for any quick-check property
#
# With loose matching of property names, case distinctions, whitespace,
# and '_' are ignored.
#
# NOTE: the property value names are NOT unique across properties,
especially
# with loose matches. For example,
# AL means Arabic Letter for the Bidi_Class property, and
# AL means Alpha_Left for the Combining_Class property, and
# AL means Alphabetic for the Line_Break property.
#
# In addition, some property names may be the same as some property value
names:
# cc means Combining_Class property, and
# cc means the General_Category property value Control (cc)
#
# The combination of property value and property name is, however, unique.
# For more information, see UTR #24: Regular Expression Guidelines
# ================================================


AA; bmg       ; Bidi_Mirroring_Glyph
AA; cf        ; Case_Folding
AA; dm        ; Decomposition_Mapping
AA; lc        ; Lowercase_Mapping
AA; na        ; Name
AA; nv        ; Numeric_Value
AA; scc       ; Special_Case_Condition
AA; sfc       ; Simple_Case_Folding
AA; slc       ; Simple_Lowercase_Mapping
AA; stc       ; Simple_Titlecase_Mapping
AA; suc       ; Simple_Uppercase_Mapping
AA; tc        ; Titlecase_Mapping
AA; uc        ; Uppercase_Mapping

BB; bc        ; BidiClass
BB; cc        ; CombiningClass
BB; dt        ; DecompositionType
BB; ea        ; EastAsianWidth
BB; gc        ; GeneralCategory
BB; jg        ; JoiningGroup
BB; jt        ; JoiningType
BB; lb        ; LineBreak
BB; nt        ; NumericType
BB; sc        ; Script

bc; AL        ; Arabic_Letter
bc; AN        ; Arabic_Number
bc; B         ; Paragraph_Separator
bc; BN        ; Boundary_Neutral
bc; CS        ; Common_Separator
bc; EN        ; European_Number
bc; ES        ; European_Separator
bc; ET        ; European_Terminator
bc; L         ; Left_To_Right
bc; LRE       ; Left_To_Right_Embedding
bc; LRO       ; Left_To_Right_Override
bc; NSM       ; Nonspacing_Mark
bc; ON        ; Other_Neutral
bc; PDF       ; Pop_Directional_Format
bc; R         ; Right_To_Left
bc; RLE       ; Right_To_Left_Embedding
bc; RLO       ; Right_To_Left_Override
bc; S         ; Segment_Separator
bc; WS        ; White_Space

cc; A         ; Above
cc; AL        ; Above_Left
cc; AR        ; Above_Right
cc; ATA       ; Attached_Above
cc; ATAL      ; Attached_Above_Left
cc; ATAR      ; Attached_Above_Right
cc; ATB       ; Attached_Below
cc; ATBL      ; Attached_Below_Left
cc; ATBR      ; Attached_Below_Right
cc; ATL       ; Attached_Left
cc; ATR       ; Attached_Right
cc; B         ; Below
cc; BL        ; Below_Left
cc; BR        ; Below_Right
cc; DB        ; Double_Above
cc; DB        ; Double_Below
cc; IS        ; Iota_Subscript
cc; KV        ; Kana_Voicing
cc; L         ; Left
cc; NK        ; Nukta
cc; NR        ; Not_Reordered
cc; OV        ; Overlay
cc; R         ; Right
cc; VR        ; Virama

dt; ca        ; canonical
dt; ci        ; circle
dt; co        ; compat
dt; fi        ; final
dt; fo        ; font
dt; fr        ; fraction
dt; in        ; initial
dt; is        ; isolated
dt; me        ; medial
dt; na        ; narrow
dt; nb        ; no_Break
dt; no        ; none
dt; sb        ; sub
dt; sm        ; small
dt; sp        ; super
dt; sq        ; square
dt; ve        ; vertical
dt; wi        ; wide

ea; A         ; Ambiguous
ea; F         ; Fullwidth
ea; H         ; Halfwidth
ea; N         ; Neutral
ea; Na        ; Narrow
ea; W         ; Wide

gc; Cc        ; Control
gc; Cf        ; Format
gc; Cn        ; Unassigned
gc; Co        ; Private_Use
gc; Cs        ; Surrogate
gc; Ll        ; Lowercase_Letter
gc; Lm        ; Modifier_Letter
gc; Lo        ; Other_Letter
gc; Lt        ; Titlecase_Letter
gc; Lu        ; Uppercase_Letter
gc; Mc        ; Spacing_Mark
gc; Me        ; Enclosing_Mark
gc; Mn        ; Nonspacing_Mark
gc; Nd        ; Decimal_Number
gc; Nl        ; Letter_Number
gc; No        ; Other_Number
gc; Pc        ; Connector_Punctuation
gc; Pd        ; Dash_Punctuation
gc; Pe        ; Close_Punctuation
gc; Pf        ; Final_Punctuation
gc; Pi        ; Initial_Punctuation
gc; Po        ; Other_Punctuation
gc; Ps        ; Open_Punctuation
gc; Sc        ; Currency_Symbol
gc; Sk        ; Modifier_Symbol
gc; Sm        ; Math_Symbol
gc; So        ; Other_Symbol
gc; Zl        ; Line_Separator
gc; Zp        ; Paragraph_Separator
gc; Zs        ; Space_Separator

jg; AIN       ; AIN
jg; ALAPH     ; ALAPH
jg; ALEF      ; ALEF
jg; BEH       ; BEH
jg; BETH      ; BETH
jg; DAL       ; DAL
jg; DALATH_RISH; DALATH_RISH
jg; E         ; E
jg; FEH       ; FEH
jg; FINAL_SEMKATH; FINAL_SEMKATH
jg; GAF       ; GAF
jg; GAMAL     ; GAMAL
jg; HAH       ; HAH
jg; HAMZA_ON_HEH_GOAL; HAMZA_ON_HEH_GOAL
jg; HE        ; HE
jg; HEH_GOAL  ; HEH_GOAL
jg; HEH       ; HEH
jg; HETH      ; HETH
jg; KAF       ; KAF
jg; KAPH      ; KAPH
jg; KNOTTED_HEH; KNOTTED_HEH
jg; LAM       ; LAM
jg; LAMADH    ; LAMADH
jg; MEEM      ; MEEM
jg; MIM       ; MIM
jg; NO_JOINING_GROUP; NO_JOINING_GROUP
jg; NOON      ; NOON
jg; NUN       ; NUN
jg; PE        ; PE
jg; QAF       ; QAF
jg; QAPH      ; QAPH
jg; REH       ; REH
jg; REVERSED_PE; REVERSED_PE
jg; SAD       ; SAD
jg; SADHE     ; SADHE
jg; SEEN      ; SEEN
jg; SEMKATH   ; SEMKATH
jg; SHIN      ; SHIN
jg; SWASH_KAF ; SWASH_KAF
jg; TAH       ; TAH
jg; TAW       ; TAW
jg; TEH_MARBUTA; TEH_MARBUTA
jg; TETH      ; TETH
jg; WAW       ; WAW
jg; YEH_BARREE; YEH_BARREE
jg; YEH_WITH_TAIL; YEH_WITH_TAIL
jg; YEH       ; YEH
jg; YUDH_HE   ; YUDH_HE
jg; YUDH      ; YUDH
jg; ZAIN      ; ZAIN

jt; C         ; Join_Causing
jt; D         ; Dual_Joining
jt; L         ; Left_Joining
jt; R         ; Right_Joining
jt; T         ; Transparent
jt; U         ; Non_Joining

lb; AI        ; Ambiguous
lb; AL        ; Alphabetic
lb; B2        ; Break_Both
lb; BA        ; Break_After
lb; BB        ; Break_Before
lb; BK        ; Mandatory_Break
lb; CB        ; Contingent_Break
lb; CL        ; Close_Punctuation
lb; CM        ; Combining_Mark
lb; CR        ; Carriage_Return
lb; EX        ; Exclamation
lb; GL        ; Glue
lb; HY        ; Hyphen
lb; ID        ; Ideographic
lb; IN        ; Inseperable
lb; IS        ; Infix_Numeric
lb; LF        ; Line_Feed
lb; NS        ; Nonstarter
lb; NU        ; Numeric
lb; OP        ; Open_Punctuation
lb; PO        ; Postfix_Numeric
lb; PR        ; Prefix_Numeric
lb; QU        ; Quotation
lb; SA        ; Complex_Context
lb; SG        ; Surrogate
lb; SP        ; Space
lb; SY        ; Break_Symbols
lb; XX        ; Unknown
lb; ZW        ; ZWSpace

nt; de        ; decimal
nt; di        ; digit
nt; no        ; none
nt; nu        ; numeric

qc; M         ; Maybe
qc; N         ; No
qc; Y         ; Yes

sc; Arab      ; Arabic
sc; Armn      ; Armenian
sc; Beng      ; Bengali
sc; Bopo      ; Bopomofo
sc; Cans      ; Canadian_Aboriginal
sc; Cher      ; Cherokee
sc; Cyrl      ; Cyrillic
sc; Deva      ; Devanagari
sc; Dsrt      ; Deseret
sc; Ethi      ; Ethiopic
sc; Geor      ; Georgian
sc; Goth      ; Gothic
sc; Grek      ; Greek
sc; Gujr      ; Gujarati
sc; Guru      ; Gurmukhi
sc; Hang      ; Hangul
sc; Hani      ; Han
sc; Hebr      ; Hebrew
sc; Hira      ; Hiragana
sc; Ital      ; Old_Italic
sc; Kana      ; Katakana
sc; Khmr      ; Khmer
sc; Knda      ; Kannada
sc; Laoo      ; Lao
sc; Latn      ; Latin
sc; Mlym      ; Malayalam
sc; Mong      ; Mongolian
sc; Mymr      ; Myanmar
sc; Ogam      ; Ogham
sc; Orya      ; Oriya
sc; Qaai      ; Inherited
sc; Runr      ; Runic
sc; Sinh      ; Sinhala
sc; Syrc      ; Syriac
sc; Taml      ; Tamil
sc; Telu      ; Telugu
sc; Thaa      ; Thaana
sc; Thai      ; Thai
sc; Tibt      ; Tibetan
sc; Yiii      ; Yi
sc; Zyyy      ; Common

xx; F         ; False
xx; T         ; True

ZZ; AHex      ; ASCII_Hex_Digit
ZZ; Alpha     ; Alphabetic
ZZ; BidiC     ; Bidi_Control
ZZ; BidiM     ; Bidi_Mirrored
ZZ; CE        ; Composition_Exclusion
ZZ; CI        ; Case_Ignorable
ZZ; Comp_Ex   ; Full_Composition_Exclusion
ZZ; Dash      ; Dash
ZZ; Dep       ; Deprecated
ZZ; DI        ; Default_Ignorable_Code_Point
ZZ; Dia       ; Diacritic
ZZ; Ext       ; Extender
ZZ; FC_NFC    ; FC_NFC_Closure
ZZ; FC_NFKC   ; FC_NFKC_Closure
ZZ; GrBase    ; Grapheme_Base
ZZ; GrExt     ; Grapheme_Extend
ZZ; GrLink    ; Grapheme_Link
ZZ; Hex       ; Hex_Digit
ZZ; Hyphen    ; Hyphen
ZZ; IDC       ; ID_Continue
ZZ; Ideo      ; Ideographic
ZZ; IDS       ; ID_Start
ZZ; IDSB      ; IDS_Binary_Operator
ZZ; IDST      ; IDS_Trinary_Operator
ZZ; JoinC     ; Join_Control
ZZ; Lower     ; Lowercase
ZZ; Math      ; Math
ZZ; NBrk      ; Non_Break
ZZ; NChar     ; Noncharacter_Code_Point
ZZ; NFC_QC    ; NFC_Quick_Check
ZZ; NFD_QC    ; NFD_Quick_Check
ZZ; NFKC_QC   ; NFKC_Quick_Check
ZZ; NFKD_QC   ; NFKD_Quick_Check
ZZ; OAlpha    ; Other_Alphabetic
ZZ; OCI       ; Other_Case_Ignorable
ZZ; ODI       ; Other_Default_Ignorable_Code_Point
ZZ; OGrExt    ; Other_Grapheme_Extend
ZZ; OLower    ; Other_Lowercase
ZZ; OMath     ; Other_Math
ZZ; OUpper    ; Other_Uppercase
ZZ; QMark     ; Quotation_Mark
ZZ; Radical   ; Radical
ZZ; SDot      ; Special_Dotted
ZZ; Term      ; Terminal_Punctuation
ZZ; UIdeo     ; Unified_Ideograph
ZZ; Upper     ; Uppercase
ZZ; WSpace    ; White_Space
ZZ; XIDC      ; XID_Continue
ZZ; XIDS      ; XID_Start
ZZ; XO_NFC    ; Expands_On_NFC
ZZ; XO_NFD    ; Expands_On_NFD
ZZ; XO_NFKC   ; Expands_On_NFKC
ZZ; XO_NFKD   ; Expands_On_NFKD


	8