[Unicode]  Public Review Issues Home | Site Map | Search
 
277 Reconciling Script and Script_Extensions Character Properties Closing Date: 2014.10.20
Status: Open
Originator: UTC
Informal Discussion: Unicode Mail List (Join)
Formal Feedback: Contact Form
 

Description of Issue:

There are currently a small number of characters whose Script value is explicit (neither Common nor Inherited) and whose Script_Extensions value set has more than one value (a “diverse” value set). An example is U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM, which has Script=Arabic, and Script_Extensions={Arabic Thaana}. These characters are not typical; most characters with a diverse Script_Extensions value set have a Script value of either Common or Inherited.

The Unicode Standard provides no principle for this: the character’s Script value may be an explicit script although there is a diverse Script_Extensions value set, but it is not documented what that means, and why it is not Common or Inherited. There is a cost to this anomaly in terms of usability and understandability. By giving users of Unicode data no clue as to when or why this is done, there is no value provided for that cost.

The Unicode Technical Committee would like to eliminate the ambiguity, and move to one of the first two following policies, A or B. It would appreciate feedback as to the preferred approach.

Where a character’s Script_Extensions value set has more than one element:

Policy A. The character’s Script value must not be explicit.

There are exactly 2 states:

  1. Script_Extensions = {Script}
  2. Script_Extensions ⊅ {Script} & Script is !explicit

Examples:

  1. scx={Common} & sc=Common; scx={Arabic} & sc=Arabic
  2. scx={Arabic, Syriac} & sc=Common

Policy B. The character’s Script value must not be explicit, except where that script is a reasonable default value.

There is one more state (3):

  1. Script_Extensions = {Script}
  2. Script_Extensions ⊅ {Script} & Script is !explicit
  3. Script_Extensions ⊋ {Script} & Script is explicit

Example:

  1. scx={Common} & sc=Common; scx={Arabic} & sc=Arabic
  2. scx={Arabic, Syriac} & sc=Common
  3. scx={Arabic, Syriac} & sc=Arabic

Note: If the committee adopts Policy A, any implementations could support the effect of Policy B with its own data, such as processing Script_Extensions to choose the Recommended Script from that value set, if there is exactly one; or picking among the scripts of the implementation-supported languages.

How to Provide Feedback: For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions. The accumulated feedback received so far on this issue is shown below, or you can look at a full page view. Feedback is reviewed by the relevant committee according to their meeting schedule.

 

Access to Copyright and terms of use