L2/12-074R1


Subject:             Property Metadata: Status
To:                     UTC
From:                   Mark Davis
Date:                   2012-02-05 (revised 2012-11-06)

Live doc:        http://goo.gl/wMEbd 

In
http://www.unicode.org/L2/L2010/10052-metaprop.txt, Ken presented a proposal for property metadata.
While there was a lot of value to that proposal, overall it is rather complicated. I think we can make
incremental progress by:

  1. focusing at the features of properties that are most important to implementers.
  2. taking the items one-by-one, and proposing concrete data files for them.
  3. removing distinctions that are not really necessary.

Right now, we have quite a number of different metaproperty definitions that are a “kind of status”, including what we call status in UAX #44:

Third Column. This column indicates the status of the property: Normative or Informative or Contributory or Provisional.

We also have the related features Immutable, Deprecated, Contributory, Stabilized, Overridable, and Obsolete. However, when you look at the use in practice, some features and combinations of features are simply not necessary. I think with a few small changes we can have a much simpler, more understandable, and more useful overall picture to present to developers. Some of these recommendations could be done in v6.3, while others might wait for v7.0.

The data for this is v6.1 data scraped from UAX #44 (scraped, because we don’t have machine-readable files).

In detail:

1. Overridable

There is only one property indicated in the standard to be Overridable: Canonical_Decomposition. Moreover, (a) we don’t give any real indication of the actual implementation considerations (Cf 12-075), and (b) the real purpose that was to be achieved by “overridable” was really to make an algorithm stable; and we have achieved that with the stability policies for normalization. So this is really unnecessary.

Recommendation: remove Overridable

2. Obsolete & Stabilized

There are only two properties that qualify as Stabilized, and only one as Obsolete:

ISO_Comment;        Informative;        Deprecated;        Stabilized;        Obsolete

Hyphen;        Informative;        Deprecated;        Stabilized

These two characteristics don’t really add any useful information for implementers (above and beyond Deprecated) and those properties are already Deprecated. So they are unnecessary; we can just use Deprecated.

Recommendation: remove Obsolete & Stabilized

3. Deprecated & Informative

There are then a small number of Deprecated properties, and they are all Informative: FC_NFKC_Closure, Expands_On_NFC, Expands_On_NFD, Expands_On_NFKC, Expands_On_NFKD, Grapheme_Link, Hyphen, ISO_Comment. Yet the Informative status doesn’t add anything once something is Deprecated: the Deprecated feature trumps Informative. And this is then simpler if it is combined into the single Status.

Recommendation: make Deprecated into a Status value.

4. Contributory & Immutable

There is only one such property:

Jamo_Short_Name;        Contributory;        Immutable

For Contributory properties, the feature “Immutable” is not important. The property that they contribute to is the one where the status matters. The Name property is the one that is important to be Immutable, not anything that contributes to it. If that is removed from Jamo_Short_Name, then the way is paved for making Immutable just another status value, a type of Normative. And it doesn’t seem important to have the ability to have a Informative property be Immutable; if it is that important that it be Immutable, it should be Normative.

Recommendation: make Immutable into a Status value.

Note: I think this is less important; although it makes sense to me to combine Immutable into a Status value, it wouldn’t be too bad to retain Immutable as a separate metaproperty either.

5. Property_Status

What I propose once we have 1-4 is an enumerated metaproperty called Property_Status, with values {Immutable, Normative, Informative, Provisional, Contributory, Deprecated}, contained in a text file called PropertyStatus.txt. We’d also need to touch up some parts of UAX #44 and Chapter 3 to reflect the above changes.

Here is a proposal for the initial contents of that file. Of course, as we add more properties or change their status, we’d record those changes in the file.

# PropertyName;        Status


Decomposition_Mapping;        Immutable

Name;        Immutable

Canonical_Combining_Class;        Immutable

Pattern_Syntax;        Immutable

Pattern_White_Space;        Immutable

Numeric_Value;        Normative

Case_Folding;        Normative

Simple_Case_Folding;        Normative

Simple_Lowercase_Mapping;        Normative

Simple_Titlecase_Mapping;        Normative

Simple_Uppercase_Mapping;        Normative

kCompatibilityVariant;        Normative

Name_Alias;        Normative

kIICore;        Normative

kIRG_GSource;        Normative

kIRG_HSource;        Normative

kIRG_JSource;        Normative

kIRG_KPSource;        Normative

kIRG_KSource;        Normative

kIRG_MSource;        Normative

kIRG_TSource;        Normative

kIRG_USource;        Normative

kIRG_VSource;        Normative

Age;        Normative

Block;        Normative

Bidi_Class;        Normative

Decomposition_Type;        Normative

General_Category;        Normative

Hangul_Syllable_Type;        Normative

Joining_Group;        Normative

Joining_Type;        Normative

Line_Break;        Normative

NFC_Quick_Check;        Normative

NFD_Quick_Check;        Normative

NFKC_Quick_Check;        Normative

NFKD_Quick_Check;        Normative

Numeric_Type;        Normative

ASCII_Hex_Digit;        Normative

Bidi_Control;        Normative

Bidi_Mirrored;        Normative

Composition_Exclusion;        Normative

Default_Ignorable_Code_Point;        Normative

Deprecated;        Normative

Full_Composition_Exclusion;        Normative

Grapheme_Base;        Normative

Grapheme_Extend;        Normative

IDS_Binary_Operator;        Normative

IDS_Trinary_Operator;        Normative

Join_Control;        Normative

Logical_Order_Exception;        Normative

Noncharacter_Code_Point;        Normative

Radical;        Normative

Soft_Dotted;        Normative

Unified_Ideograph;        Normative

Variation_Selector;        Normative

White_Space;        Normative

kAccountingNumeric;        Informative

kOtherNumeric;        Informative

kPrimaryNumeric;        Informative

Lowercase_Mapping;        Informative

NFKC_Casefold;        Informative

Titlecase_Mapping;        Informative

Uppercase_Mapping;        Informative

Bidi_Mirroring_Glyph;        Informative

Script_Extensions;        Informative

Unicode_1_Name;        Informative

kMandarin;        Informative

kRSUnicode;        Informative

kTotalStrokes;        Informative

Script;        Informative

East_Asian_Width;        Informative

Grapheme_Cluster_Break;        Informative

Sentence_Break;        Informative

Word_Break;        Informative

Alphabetic;        Informative

Case_Ignorable;        Informative

Cased;        Informative

Changes_When_Casefolded;        Informative

Changes_When_Casemapped;        Informative

Changes_When_Lowercased;        Informative

Changes_When_NFKC_Casefolded;        Informative

Changes_When_Titlecased;        Informative

Changes_When_Uppercased;        Informative

Dash;        Informative

Diacritic;        Informative

Extender;        Informative

Hex_Digit;        Informative

ID_Continue;        Informative

ID_Start;        Informative

Ideographic;        Informative

Lowercase;        Informative

Math;        Informative

Quotation_Mark;        Informative

STerm;        Informative

Terminal_Punctuation;        Informative

Uppercase;        Informative

XID_Continue;        Informative

XID_Start;        Informative

CJK_Radical;        Provisional

Emoji_DCM;        Provisional

Emoji_KDDI;        Provisional

Emoji_SB;        Provisional

Named_Sequences;        Provisional

Named_Sequences_Prov;        Provisional

Standardized_Variant;        Provisional

kBigFive;        Provisional

kCCCII;        Provisional

kCNS1986;        Provisional

kCNS1992;        Provisional

kCangjie;        Provisional

kCantonese;        Provisional

kCheungBauer;        Provisional

kCheungBauerIndex;        Provisional

kCihaiT;        Provisional

kCowles;        Provisional

kDaeJaweon;        Provisional

kDefinition;        Provisional

kEACC;        Provisional

kFenn;        Provisional

kFennIndex;        Provisional

kFourCornerCode;        Provisional

kFrequency;        Provisional

kGB0;        Provisional

kGB1;        Provisional

kGB3;        Provisional

kGB5;        Provisional

kGB7;        Provisional

kGB8;        Provisional

kGSR;        Provisional

kGradeLevel;        Provisional

kHDZRadBreak;        Provisional

kHKGlyph;        Provisional

kHKSCS;        Provisional

kHanYu;        Provisional

kHangul;        Provisional

kHanyuPinlu;        Provisional

kHanyuPinyin;        Provisional

kIBMJapan;        Provisional

kIRGDaeJaweon;        Provisional

kIRGDaiKanwaZiten;        Provisional

kIRGHanyuDaZidian;        Provisional

kIRGKangXi;        Provisional

kJIS0213;        Provisional

kJapaneseKun;        Provisional

kJapaneseOn;        Provisional

kJis0;        Provisional

kJis1;        Provisional

kKPS0;        Provisional

kKPS1;        Provisional

kKSC0;        Provisional

kKSC1;        Provisional

kKangXi;        Provisional

kKarlgren;        Provisional

kKorean;        Provisional

kLau;        Provisional

kMainlandTelegraph;        Provisional

kMatthews;        Provisional

kMeyerWempe;        Provisional

kMorohashi;        Provisional

kNelson;        Provisional

kPhonetic;        Provisional

kPseudoGB1;        Provisional

kRSAdobe_Japan1_6;        Provisional

kRSJapanese;        Provisional

kRSKanWa;        Provisional

kRSKangXi;        Provisional

kRSKorean;        Provisional

kSBGY;        Provisional

kSemanticVariant;        Provisional

kSimplifiedVariant;        Provisional

kSpecializedSemanticVariant;        Provisional

kTaiwanTelegraph;        Provisional

kTang;        Provisional

kTraditionalVariant;        Provisional

kVietnamese;        Provisional

kXHC1983;        Provisional

kXerox;        Provisional

kZVariant;        Provisional

Indic_Matra_Category;        Provisional

Indic_Syllabic_Category;        Provisional

Jamo_Short_Name;        Contributory

Other_Alphabetic;        Contributory

Other_Default_Ignorable_Code_Point;        Contributory

Other_Grapheme_Extend;        Contributory

Other_ID_Continue;        Contributory

Other_ID_Start;        Contributory

Other_Lowercase;        Contributory

Other_Math;        Contributory

Other_Uppercase;        Contributory

FC_NFKC_Closure;        Deprecated

ISO_Comment;        Deprecated

Expands_On_NFC;        Deprecated

Expands_On_NFD;        Deprecated

Expands_On_NFKC;        Deprecated

Expands_On_NFKD;        Deprecated

Grapheme_Link;        Deprecated

Hyphen;        Deprecated