L2/17-377

Proposal for the encoding of three Arabic tanween characters

Eric Muller — Amazon

October 15, 2017

Background

In L2/15-329, Mussa A. A. Abudena proposes 6 characters (1 through 6) for the vowel signs and tanweens, as they are typically rendered in Qalun’s transmission of Nafi’s reading of the Quran.

I agree with the UTC that characters 1 through 3 are just glyphic variants damma, dammatan, and open dammatan. The UTC also suggested that the remaining three characters be represented by the combination of damma or fatha with U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM, and kasra with U+06ED ARABIC SMALL LOW MEEM.

I would like to argue that the representation suggested by the UTC is inadequate, and not just for the typical appearance of Qalun’s transmission of Nafi’s reading, but actually for all calligraphic styles of the Quran. I therefore propose to encode three additional tanween characters, to be used in all transmissions.

Sources

[kfgqpc-h] edition of the Quran made available by the King Fahd Complex For The Printing Of The Holy Quran. It uses Hafs’ transmission of Asim’s reading. This transmission is by far the most popular currently. Unicode text and font available at http://fonts.qurancomplex.gov.sa/?page_id=42 and images downloadable at http://dm.qurancomplex.gov.sa/download/.

[oman] Electronic Mushaf Muscat Calligraphy Project. Also uses Hafs’ transmission of Asim’s reading. Unicode representation is only available punctually. https://www.mushafmuscat.om/.

[kfgqpc-w] edition of the Quran made available by the King Fahd Complex For The Printing Of The Holy Quran. It uses Warsh’s transmission of Nafi’s reading. Available at https://archive.org/details/QuranWarshNarration as images only.

[tunisian] Tunisian edition of the Quran. It uses Warsh’s transmission of Nafi’s reading. Available at https://archive.org/details/QuranWarshAsbahani). If you read Arabic, you can learn about the various tanweeen forms in pages 9-11 of the appendix.

[wics] edition of the Quran, published in 1989 by the World Islamic Call Society - Tripoli - Libya. Uses Qalun’s transmission of Nafi’s reading. Available at https://archive.org/details/Ms7FalGmaHRYaH as images only.

The Quran

Nazalization of the three short vowels fatha, kasra, and damma is ordinarily written using fathatan, kasratan and dammatan (the tanween).

In the Quran, three distinct pronunciations of the tanween are written by different signs:

fatha damma kasra
ordinary vowel sign [kfgqpc-h] 2:7
[tunisian] 2:7
[kfgqpc-w] 2:6
[wics] 2:6
[kfgqpc-h] 2:6
[tunisian] 2:6
[kfgqpc-w] 2:5
[wics] 2:5
[kfgqpc-h] 2:6
[tunisian] 2:6
[kfgqpc-w] 2:5
[wics] 2:5
ordinary tanween [kfgqpc-h] 2:182
[tunisian] 2:182
[kfgqpc-w] 2:181
[wics] 2:181
[kfgqpc-h] 2:7
[tunisian] 2:7
[kfgqpc-w] 2:6
[wics] 2:6
[kfgqpc-h] 2:36
[tunisian] 2:36
[kfgqpc-w] 2:35
[wics] 2:35
open tanween [kfgqpc-h] 2:182
[tunisian] 2:182
[kfgqpc-w] 2:181
[wics] 2:181
[kfgqpc-h] 2:7
[tunisian] 2:7
[kfgqpc-w] 2:6
[wics] 2:6
[kfgqpc-h] 2:36
[tunisian] 2:36
[kfgqpc-w] 2:35
[wics] 2:35
meem tanween [kfgqpc-h] 2:95
[tunisian] 2:95
[kfgqpc-w] 2:94
[wics] 2:94
[kfgqpc-h] 2:10
[tunisian] 2:10
[kfgqpc-w] 2:9
[wics] 2:9
[kfgqpc-h] 2:99
[tunisian] 2:99
[kfgqpc-w] 2:98
[wics] 2:98

The fragments shown above are the same parts of the text. Qalun’s and Warsh’s transmissions omit the muqatta'at at the beginning of the surah, hence the shift by one in ayah numbers.

Also relevant to the present discussion is the appearance of a small high meem isolated form used above a noon to indicate that it should be pronounced as a meem.

[kfgqpc-h] 2:27
[tunisian] 2:27
[kfgqpc-w] 2:26
[wics] 2:26

Number of occurrences in [kfgqpc-h]

fatha damma kasra
vowel sign 123,274 37,334 46,769
ordinary tanween 734 578 606
open tanween 2,901 1,807 1,935
meem tanween 106 134 99

Representation in Unicode

fatha damma kasra
vowel sign U+064E ARABIC FATHA, ccc=30 U+064F ARABIC DAMMA, ccc=31 U+0650 ARABIC KASRA, ccc=32
ordinary tanween U+064B ARABIC FATHATAN, ccc=27 U+064C ARABIC DAMMATAN, ccc=28 U+064D ARABIC KASRATAN, ccc=29
open tanween U+08F0 ARABIC OPEN FATHATAN, ccc=27 U+08F1 ARABIC OPEN DAMMATAN, ccc=28 U+08F2 ARABIC OPEN KASRATAN, ccc=29

[kfgqpc-h] actually hijacks U+0657, U+065E and U+0656 to represent and display the open tanween, presumably because the text was created before the encoding of U+08F0..08F2.

Representation of meen tanween

Both [kfgqpc-h] and [oman] use <U+064E ARABIC FATHA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM> for meem fathatan, and similarly <U+064F ARABIC DAMMA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM> for meem dammathan.

[kfgqpc-h] continues the pattern for meem kasratan: <U+0650 ARABIC KASRA, U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM>; whereas [oman] uses <U+0650 ARABIC KASRA, U+06ED ARABIC SMALL LOW MEEM>.

Those representations are not satisfactory.

  1. There is an asymetry between the three forms of tanween, since the first two forms are encoded atomically and the meem tanween isn’t.
  2. It is quite clear that meem modifies the vowel sign, not the base character, a situation that is not adequately represented by a combining mark on the base character.
  3. In the typical shapes used for Qalun’s transmission, the shape of the meem in the tanween is clearly distinct from the shape of meem above a noon. Thus it is akward to use the same character (U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM) in both cases.
  4. While the systematic use of U+06E2 ARABIC SMALL HIGH MEEM ISOLATED FORM leads to a uniform pattern, it is problematic to use a character with ccc=230 above, while it is actually displayed below the base. The case of shadda + kasra, where the kasra can be displayed just below the shadda and above the base, is only partially a precedent, as shadda and, more importantly, kasra have fixed combining classes.
  5. Conversely, using different characters for the small meem depending on whether is attaches to fatha or damma on the one hand or kasra on the other hand is cumbersome at best.

Proposal to encode atomic meem tanween

Consequently, and given the precedent of the open tanween characters, it seems appropriate to encode three new atomic characters for meem tanween, with properties similar to those of the open tanween characters.

08D0; ARABIC MEEM FATHATAN;Mn;27;NSM;;;;;N;;;;;
08D1; ARABIC MEEM DAMMATAN;Mn;28;NSM;;;;;N;;;;;
08D2; ARABIC MEEM KASRATAN;Mn;29;NSM;;;;;N;;;;;

For the representative glyphs, I suggest to use shapes similar to [kfgqpc-h] (built using the existing font).

Proposal Summary Form


ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646

Please fill all the sections A, B and C below.
Please read Principles and Procedures Document (P & P) from
http://std.dkuug.dk/JTC1/SC2/WG2/docs/principles.html for guidelines and details before filling this form.
Please ensure you are using the latest Form from
http://std.dkuug.dk/JTC1/SC2/WG2/docs/summaryform.html.
See also http://std.dkuug.dk/JTC1/SC2/WG2/docs/roadmaps.html for latest Roadmaps.

Form number: N4502-F ( Original 1994-10-14; Revised 1995-01, 1995-04, 1996-04, 1996-08, 1999-03, 2001-05, 2001-09, 2003-11, 2005-01, 2005-09, 2005-10, 2007-03, 2008-05, 2009-11, 2011-03, 2012-01)
A. Administrative
1.Title: Proposal for the encoding of three Arabic meem tanween characters
2. Requester's name: Eric Muller (emuller@amazon.com)
3. Requester type (Member body/Liaison/Individual contribution): Individual contribution
4. Submission date: October 15, 2017
5. Requester's reference (if applicable):  
6. Choose one of the following:
  This is a complete proposal: YES
  (or) More information will be provided later:  
B. Technical - General
1. Choose one of the following:
  a. This proposal is for a new script (set of characters): NO
  Proposed name of script:  
  b. The proposal is for addition of character(s) to an existing block: YES
  Name of the existing block: Arabic Extended-A
2. Number of characters in proposal: 3
3. Proposed category (select one from below - see section 2.2 of P&P document):
 A-Contemporary X  B.1-Specialized (small collection) X  B.2-Specialized (large collection)  
 C-Major extinct    D-Attested extinct    E-Minor extinct  
 F-Archaic Hieroglyphic or Ideographic     G-Obscure or questionable usage symbols  
4. Is a repertoire including character names provided? YES
  a. If YES, are the names in accordance with the "character naming guidelines" YES
  b. Are the character shapes attached in a legible form suitable for review? YES
5. Fonts related:
  a. Who will provide the appropriate computerized font to the Project Editor of 10646 for publishing the standard?
  I suggest to build glyphs from the existing font, for consistency.
  b. Identify the party granting a license for use of the font by the editors (include address, e-mail, ftp-site, etc.):
   
6. References:
  a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided? NO
  b. Are published examples of use (such as samples from newspapers, magazines, or other sources)
  of proposed characters attached? YES
7. Special encoding issue
  Does the proposal address other aspects of character data processing (if applicable) such as input,
  presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information)?  
   
8. Submitters are invited to provide any additional information about Properties of the proposed Character(s) or Script that will assist in correct understanding of and correct linguistic processing of the proposed character(s) or script. Examples of such properties are: Casing information, Numeric information, Currency information, Display behaviour information such as line breaks, widths etc., Combining behaviour, Spacing behaviour, Directional behaviour, Default Collation behaviour, relevance in Mark Up contexts, Compatibility equivalence and other Unicode normalization related information. See the Unicode standard at http://www.unicode.org for such information on other scripts. Also see UAX#44: http://www.unicode.org/reports/tr44/ and associated Unicode Technical Reports for information needed for consideration by the Unicode Technical Committee for inclusion in the Unicode Standard.
C. Technical - Justification
1. Has this proposal for addition of character(s) been submitted before? NO
  If YES explain  
2. Has contact been made to members of the user community (for example: National Body,
  user groups of the script or characters, other experts, etc.)? NO
  If YES, available relevant documents:  
3. Information on the user community for the proposed characters (for example:
  size, demographics, information technology use, or publishing use) is included?  
  Reference:  
4. The context of use for the proposed characters type of use; common or rare) Common
  Reference:  
5. Are the proposed characters in current use by the user community? YES
  If YES, where? Reference: Quran
6. After giving due considerations to the principles in the P&P document must the proposed characters be entirely
  in the BMP? YES
  If YES, is a rationale provided? NO
  If Yes, reference:  
7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? YES
8. Can any of the proposed characters be considered a presentation form of an existing
  character or character sequence? YES
  If YES, is a rationale for its inclusion provided? YES
  If Yes, reference:  
9. Can any of the proposed characters be encoded using a composed character sequence of either
  existing characters or other proposed characters? YES
  If YES, is a rationale for its inclusion provided? YES
  If Yes, reference:  
10. Can any of the proposed character(s) be considered to be similar (in appearance or function)
  to, or could be confused with, an existing character? NO
  If YES, is a rationale for its inclusion provided?  
  If Yes, reference:  
11. Does the proposal include use of combining characters and/or use of composite sequences? YES
  If YES, is a rationale for such use provided? YES
  If Yes, reference:  
  Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided?  
  If Yes, reference:  
12. Does the proposal contain characters with any special properties such as
  control function or similar semantics? NO
  If YES, describe in detail (include attachment if necessary)  
   
   
13. Does the proposal contain any Ideographic compatibility characters? NO
  If YES, are the equivalent corresponding unified ideographic characters identified?  
  If Yes, reference: