L2/08-287

Public Review Issue #122

Proposal for Additional Deprecated Characters

The Unicode Technical Committee is considering giving a number of additional characters the Deprecated property.

The Deprecated property means that the use of the character is discouraged, and provides a machine-readable table for implementations. However, there are a number of characters that are marked as "discouraged" either in the text of the standard or in the names list, so the goal is to either add them to the set of characters with the Deprecated property, or if there is good reason not to, then remove the phrasing about their being discouraged. These are listed below in Table 1.

Table 2 provides additional characters that various people have proposed for Deprecation when this topic was discussed in the UTC. Table 3 gives the characters deprecated in U5.1, for comparison.

As part of this proposal, we would add text that makes the following points more clearly.


The UTC would appreciate feedback on this proposal.


Table 1. Discouraged

Characters marked as discouraged (or "not encouraged") either in the name charts or text of the standard. Characters marked ** cannot occur in NFKC; those marked * cannot occur in NFC.
(http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[\u0344\u037E\u0387\u0F73\u0F75\u0F77\u0F79\u0F81\u17A4\u17B4\u17D8\u20A4\u2126\u212A\u212B\u2329\u232A] )

0344 ( ̈́ ) COMBINING GREEK DIALYTIKA TONOS *
037E ( ; ) GREEK QUESTION MARK *
0387 ( · ) GREEK ANO TELEIA *
0F73 TIBETAN VOWEL SIGN II *
0F75 TIBETAN VOWEL SIGN UU *
0F77 TIBETAN VOWEL SIGN VOCALIC RR **
0F79 TIBETAN VOWEL SIGN VOCALIC LL **
0F81 TIBETAN VOWEL SIGN REVERSED II *
17A4 ( ឤ ) KHMER INDEPENDENT VOWEL QAA
17B4 KHMER VOWEL INHERENT AQ
17D8 ( ៘ ) KHMER SIGN BEYYAL
20A4 ( ₤ ) LIRA SIGN
2126 ( Ω ) OHM SIGN *
212A ( K ) KELVIN SIGN *
212B ( Å ) ANGSTROM SIGN *
2329 ( 〈 ) LEFT-POINTING ANGLE BRACKET *
232A ( 〉 ) RIGHT-POINTING ANGLE BRACKET *
The preferred forms for these are:
U+27E8 ( ⟨ ) MATHEMATICAL LEFT ANGLE BRACKET
U+27E9 ( ⟩ ) MATHEMATICAL RIGHT ANGLE BRACKET
while the NFC and NFKC forms are:
U+3008 ( 〈 ) LEFT ANGLE BRACKET
U+3009 ( 〉 ) RIGHT ANGLE BRACKET


Table 2. Additional Proposed Deprecations

Characters proposed to the UTC during discussion.
(http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[\u0149\u0953\u0954\u0F07] )

0149 ( ʼn ) LATIN SMALL LETTER N PRECEDED BY APOSTROPHE **
The preferred form is 'n or ’n (with U+2019): this is but one of many such abbreviations in Dutch and Afrikaans, all of which are represented with apostrophe plus letter. The NFKC form does not match this preferred form, having U+02BC ( ʼ ) MODIFIER LETTER APOSTROPHE.
0953 ( ॓ ) DEVANAGARI GRAVE ACCENT
0954 ( ॔ ) DEVANAGARI ACUTE ACCENT
0F07 ( ༇ ) TIBETAN MARK YIG MGO TSHEG SHAD MA


Table 3. U5.1 Deprecated

For comparison, the following characters are Deprecated in U5.1.
(http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:deprecated:])

U+0340 ( ̀ ) COMBINING GRAVE TONE MARK
U+0341 ( ́ ) COMBINING ACUTE TONE MARK
U+17A3 ( ឣ ) KHMER INDEPENDENT VOWEL QAQ
U+17D3 ( ៓ ) KHMER SIGN BATHAMASAT
U+206A (  ) INHIBIT SYMMETRIC SWAPPING
U+206B (  ) ACTIVATE SYMMETRIC SWAPPING
U+206C (  ) INHIBIT ARABIC FORM SHAPING
U+206D (  ) ACTIVATE ARABIC FORM SHAPING
U+206E (  ) NATIONAL DIGIT SHAPES
U+206F (  ) NOMINAL DIGIT SHAPES
U+E0001 (  ) LANGUAGE TAG
U+E0020 (  ) TAG SPACE
...
U+E007F (  ) CANCEL TAG


Table 4. Characters not occurring in NFC

For comparison, the following characters cannot occur in NFC text.
(http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:nfc_quick_check=no:])
 

Table 5. Characters not occurring in NFKC

Characters that don't occur in NFKC are not closely related to deprecation, but for comparison they can be referenced through the following link:

(http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[:nfkc_quick_check=no:]-[:nfc_quick_check=no:]])