L2/00-134
ISO/IEC JTC1/SC2/WG2 N_____
DATE: 2000-04-07
| DOC TYPE: | Expert contribution | 
| TITLE: | Proposal to Encode Urdu
  Numbers to Remove Ambiguity in Current Standard | 
| SOURCE: | Paul Nelson (Redmond, WA,
  USA), Ashhar Farhan (Hyderabad, India), Arif Hisam (Karachi, Pakistan), John
  Clews (UK)  | 
| PROJECT: |   | 
| STATUS: | Proposal | 
| ACTION ID: | FYI | 
| DUE DATE: | -- | 
| DISTRIBUTION: | Worldwide | 
| MEDIUM: | Paper and web | 
| NO. OF PAGES: | 3 | 
| A. Administrative | |
| 1. Title | Proposal to Encode Urdu
  Numbers to Remove Ambiguity in Current Standard. | 
| 2. Requesters name | Paul Nelson (Redmond, WA,
  USA), Ashhar Farhan (Hyderabad, India), Arif Hisam (Karachi, Pakistan), John
  Clews (UK) . | 
| 3. Requester type | Expert request. | 
| 4. Submission date | 1998-11-06 | 
| 5. Requesters reference |   | 
| 6a. Completion | This is a complete proposal. | 
| 6b. More information to be
  provided? | Only as required for
  clarification. | 
 
| B. Technical  General | |
| 1a. New script? Name? | No. | 
| 1b. Addition of characters to
  existing block? Name? | Yes. Arabic. | 
| 2. Number of characters | 10. | 
| 3. Proposed category | 
 | 
| 4. Proposed level of
  implementation and rationale | 
 | 
| 5a. Character names included
  in proposal? | Yes. | 
| 5b. Character names in
  accordance with guidelines? | Yes. | 
| 5c. Character shapes
  reviewable? | Yes. | 
| 6a. Who will provide
  computerized font? | Paul Nelson. | 
| 6b. Font currently available? | Paul Nelson. | 
| 6c. Font format? | TrueType. | 
| 7a. Are references (to other
  character sets, dictionaries, descriptive texts, etc.) provided? | Yes. | 
| 7b. Are published examples
  (such as samples from newspapers, magazines, or other sources) of use of proposed
  characters attached? | Yes. | 
| 8. Does the proposal address
  other aspects of character data processing? | 
 | 
 
 
| C. Technical  Justification | |
| 1. Contact with the user
  community? | Yes. Farhan is Director of Computer
  Corp, leading Urdu software company for PCs. | 
| 2. Information on the user
  community? | Native. | 
| 3a. The context of use for
  the proposed characters? | The Urdu numerals are currently
  assigned to the same locations as Farsi numerals. There are three ambiguous
  cases where Farsi and Urdu numerals cannot be differentiated. | 
| 3b. Reference |   | 
| 4a. Proposed characters in
  current use? | Yes. | 
| 4b. Where? | Native speakers in Pakistan,
  India and worldwide. | 
| 5a. Characters should be
  encoded entirely in BMP? | Already in BMP and in accordance with Roadmap. | 
| 5b. Rationale | 
 | 
| 6. Should characters be kept
  in a continuous range? | Yes. This greatly facilitates computational usage. | 
| 7a. Can the characters be
  considered a presentation form of an existing character or character
  sequence?  | No. | 
| 7b. Where? |   | 
| 7c. Reference |   | 
| 8a. Can any of the characters
  be considered to be similar (in appearance or function) to an existing
  character? | Yes. However, this proposal's
  goal is to remove the ambiguity from current Unicode assignments. | 
| 8b. Where? | EXTENDED ARABIC-INDIC
  DIGITS [06F0-06F9]. There are three ambiguous digits between Farsi and Urdu. | 
| 8c. Reference |   | 
| 9a. Combining characters or
  use of composite sequences included? | N/A. | 
| 9b. List of composite
  sequences and their corresponding glyph images provided? | N/A. | 
| 10. Characters with any
  special properties such as control function, etc. included? | No. | 
 
 
| D. SC2/WG2 AdministrativeTo be completed by SC2/WG2 | |
| 1. Relevant SC 2/WG 2
  document numbers: |   | 
| 2. Status (list of meeting
  number and corresponding action or disposition) |   | 
| 3. Additional contact to user
  communities, liaison organizations etc. |   | 
| 4. Assigned category and
  assigned priority/time frame |   | 
| Other Comments |   | 
 
The Unicode Standard currently
has Urdu assigned to share the same numbers with Farsi (06f0-06f9 EXTENDED
ARABIC-INDIC DIGITS). This brings about an ambiguous situation when attempting
to represent Farsi and Urdu in plain text in the same document. The current
standard also makes it impossible to represent Farsi and Urdu number glyphs in
the same font. Three characters having different glyph outlines for Urdu and
Farsi cause this ambiguity. These characters are 06f4 (FOUR), 06f6 (SIX) and
06f7 (SEVEN). To resolve this problem, and allow Urdu number handling to be
computationally more efficient, we propose to encode Urdu numbers in a
contiguous range in the Arabic Block. There are three open contiguous areas in
which the Urdu number will fit: 0600-060B, 0610-061A, and 0656-065f.
The Urdu numbers should be
encoded with the glyphs shown below:

| Proposed Unicode | NAME | 
| 0600 | URDU DIGIT ZERO | 
| 0601 | URDU DIGIT ONE | 
| 0602 | URDU DIGIT TWO | 
| 0603 | URDU DIGIT THREE | 
| 0604 | URDU DIGIT FOUR | 
| 0605 | URDU DIGIT FIVE | 
| 0606 | URDU DIGIT SIX | 
| 0607 | URDU DIGIT SEVEN | 
| 0608 | URDU DIGIT EIGHT | 
| 0609 | URDU DIGIT NINE |