Additions to Unicode for Urdu

L2/00-134

ISO/IEC JTC1/SC2/WG2 N_____

DATE: 2000-04-07

DOC TYPE:	Expert contribution
TITLE:	Proposal to Encode Urdu Numbers to Remove Ambiguity in Current Standard
SOURCE:	Paul Nelson (Redmond, WA, USA), Ashhar Farhan (Hyderabad, India), Arif Hisam (Karachi, Pakistan), John Clews (UK)
PROJECT:
STATUS:	Proposal
ACTION ID:	FYI
DUE DATE:	--
DISTRIBUTION:	Worldwide
MEDIUM:	Paper and web
NO. OF PAGES:	3

A. Administrative
1. Title	Proposal to Encode Urdu Numbers to Remove Ambiguity in Current Standard.
2. Requesters name	Paul Nelson (Redmond, WA, USA), Ashhar Farhan (Hyderabad, India), Arif Hisam (Karachi, Pakistan), John Clews (UK) .
3. Requester type	Expert request.
4. Submission date	1998-11-06
5. Requesters reference
6a. Completion	This is a complete proposal.
6b. More information to be provided?	Only as required for clarification.

B. Technical General
1a. New script? Name?	No.
1b. Addition of characters to existing block? Name?	Yes. Arabic.
2. Number of characters	10.
3. Proposed category
4. Proposed level of implementation and rationale
5a. Character names included in proposal?	Yes.
5b. Character names in accordance with guidelines?	Yes.
5c. Character shapes reviewable?	Yes.
6a. Who will provide computerized font?	Paul Nelson.
6b. Font currently available?	Paul Nelson.
6c. Font format?	TrueType.
7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided?	Yes.
7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?	Yes.
8. Does the proposal address other aspects of character data processing?

C. Technical Justification
1. Contact with the user community?	Yes. Farhan is Director of Computer Corp, leading Urdu software company for PCs.
2. Information on the user community?	Native.
3a. The context of use for the proposed characters?	The Urdu numerals are currently assigned to the same locations as Farsi numerals. There are three ambiguous cases where Farsi and Urdu numerals cannot be differentiated.
3b. Reference
4a. Proposed characters in current use?	Yes.
4b. Where?	Native speakers in Pakistan, India and worldwide.
5a. Characters should be encoded entirely in BMP?	Already in BMP and in accordance with Roadmap.
5b. Rationale
6. Should characters be kept in a continuous range?	Yes. This greatly facilitates computational usage.
7a. Can the characters be considered a presentation form of an existing character or character sequence?	No.
7b. Where?
7c. Reference
8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character?	Yes. However, this proposal's goal is to remove the ambiguity from current Unicode assignments.
8b. Where?	EXTENDED ARABIC-INDIC DIGITS [06F0-06F9]. There are three ambiguous digits between Farsi and Urdu.
8c. Reference
9a. Combining characters or use of composite sequences included?	N/A.
9b. List of composite sequences and their corresponding glyph images provided?	N/A.
10. Characters with any special properties such as control function, etc. included?	No.

D. SC2/WG2 Administrative To be completed by SC2/WG2
1. Relevant SC 2/WG 2 document numbers:
2. Status (list of meeting number and corresponding action or disposition)
3. Additional contact to user communities, liaison organizations etc.
4. Assigned category and assigned priority/time frame
Other Comments

The Unicode Standard currently has Urdu assigned to share the same numbers with Farsi (06f0-06f9 EXTENDED ARABIC-INDIC DIGITS). This brings about an ambiguous situation when attempting to represent Farsi and Urdu in plain text in the same document. The current standard also makes it impossible to represent Farsi and Urdu number glyphs in the same font. Three characters having different glyph outlines for Urdu and Farsi cause this ambiguity. These characters are 06f4 (FOUR), 06f6 (SIX) and 06f7 (SEVEN). To resolve this problem, and allow Urdu number handling to be computationally more efficient, we propose to encode Urdu numbers in a contiguous range in the Arabic Block. There are three open contiguous areas in which the Urdu number will fit: 0600-060B, 0610-061A, and 0656-065f.

The Urdu numbers should be encoded with the glyphs shown below:

Proposed Unicode	NAME
0600	URDU DIGIT ZERO
0601	URDU DIGIT ONE
0602	URDU DIGIT TWO
0603	URDU DIGIT THREE
0604	URDU DIGIT FOUR
0605	URDU DIGIT FIVE
0606	URDU DIGIT SIX
0607	URDU DIGIT SEVEN
0608	URDU DIGIT EIGHT
0609	URDU DIGIT NINE

A. Administrative

B. Technical  General

C. Technical  Justification

D. SC2/WG2 Administrative

B. Technical General

C. Technical Justification