L2/00-134

ISO/IEC JTC1/SC2/WG2 N_____

DATE: 2000-04-07

DOC TYPE:

Expert contribution

TITLE:

Proposal to Encode Urdu Numbers to Remove Ambiguity in Current Standard

SOURCE:

Paul Nelson (Redmond, WA, USA), Ashhar Farhan (Hyderabad, India), Arif Hisam (Karachi, Pakistan), John Clews (UK)

PROJECT:

 

STATUS:

Proposal

ACTION ID:

FYI

DUE DATE:

--

DISTRIBUTION:

Worldwide

MEDIUM:

Paper and web

NO. OF PAGES:

3


A. Administrative

1. Title

Proposal to Encode Urdu Numbers to Remove Ambiguity in Current Standard.

2. Requester’s name

Paul Nelson (Redmond, WA, USA), Ashhar Farhan (Hyderabad, India), Arif Hisam (Karachi, Pakistan), John Clews (UK) .

3. Requester type

Expert request.

4. Submission date

1998-11-06

5. Requester’s reference

 

6a. Completion

This is a complete proposal.

6b. More information to be provided?

Only as required for clarification.

 

B. Technical – General

1a. New script? Name?

No.

1b. Addition of characters to existing block? Name?

Yes. Arabic.

2. Number of characters

10.

3. Proposed category

 

4. Proposed level of implementation and rationale

 

5a. Character names included in proposal?

Yes.

5b. Character names in accordance with guidelines?

Yes.

5c. Character shapes reviewable?

Yes.

6a. Who will provide computerized font?

Paul Nelson.

6b. Font currently available?

Paul Nelson.

6c. Font format?

TrueType.

7a. Are references (to other character sets, dictionaries, descriptive texts, etc.) provided?

Yes.

7b. Are published examples (such as samples from newspapers, magazines, or other sources) of use of proposed characters attached?

Yes.

8. Does the proposal address other aspects of character data processing?

 

 

 

 

C. Technical – Justification

1. Contact with the user community?

Yes. Farhan is Director of Computer Corp, leading Urdu software company for PCs.

2. Information on the user community?

Native.

3a. The context of use for the proposed characters?

The Urdu numerals are currently assigned to the same locations as Farsi numerals. There are three ambiguous cases where Farsi and Urdu numerals cannot be differentiated.

3b. Reference

 

4a. Proposed characters in current use?

Yes.

4b. Where?

Native speakers in Pakistan, India and worldwide.

5a. Characters should be encoded entirely in BMP?

Already in BMP and in accordance with Roadmap.

5b. Rationale

 

6. Should characters be kept in a continuous range?

Yes. This greatly facilitates computational usage.

7a. Can the characters be considered a presentation form of an existing character or character sequence?

No.

7b. Where?

 

7c. Reference

 

8a. Can any of the characters be considered to be similar (in appearance or function) to an existing character?

Yes. However, this proposal's goal is to remove the ambiguity from current Unicode assignments.

8b. Where?

EXTENDED ARABIC-INDIC DIGITS [06F0-06F9]. There are three ambiguous digits between Farsi and Urdu.

8c. Reference

 

9a. Combining characters or use of composite sequences included?

N/A.

9b. List of composite sequences and their corresponding glyph images provided?

N/A.

10. Characters with any special properties such as control function, etc. included?

No.

 

 

D. SC2/WG2 Administrative

To be completed by SC2/WG2

1. Relevant SC 2/WG 2 document numbers:

 

2. Status (list of meeting number and corresponding action or disposition)

 

3. Additional contact to user communities, liaison organizations etc.

 

4. Assigned category and assigned priority/time frame

 

Other Comments

 

 

The Unicode Standard currently has Urdu assigned to share the same numbers with Farsi (06f0-06f9 EXTENDED ARABIC-INDIC DIGITS). This brings about an ambiguous situation when attempting to represent Farsi and Urdu in plain text in the same document. The current standard also makes it impossible to represent Farsi and Urdu number glyphs in the same font. Three characters having different glyph outlines for Urdu and Farsi cause this ambiguity. These characters are 06f4 (FOUR), 06f6 (SIX) and 06f7 (SEVEN). To resolve this problem, and allow Urdu number handling to be computationally more efficient, we propose to encode Urdu numbers in a contiguous range in the Arabic Block. There are three open contiguous areas in which the Urdu number will fit: 0600-060B, 0610-061A, and 0656-065f.

The Urdu numbers should be encoded with the glyphs shown below:

Proposed Unicode

NAME

0600

URDU DIGIT ZERO

0601

URDU DIGIT ONE

0602

URDU DIGIT TWO

0603

URDU DIGIT THREE

0604

URDU DIGIT FOUR

0605

URDU DIGIT FIVE

0606

URDU DIGIT SIX

0607

URDU DIGIT SEVEN

0608

URDU DIGIT EIGHT

0609

URDU DIGIT NINE