L2/11-071 Subject: Response from Jonathan Kew on Arabic L2/11-069 Date: Thu, 10 Feb 2011 15:59:05 -0600 From: Lorna Priest This is Jonathan Kew's response to L2/11-069. He gave permission for you to include it in the registry. *** I'm not sure how comfortable I am with this. He seems to be saying that if someone has a "letter" (as understood by users of an orthography) that visually looks like one of the existing characters combined with one of these "letter-making" marks, it should be encoded as the sequence, not as a new character. I think there are two distinct situations that come up in practice. One is examples like we see in the Chinese proposals about Uighur, where they propose "letters" that are clearly multigraphs in the Arabic script, but are being treated as separate "letters" on the basis of a phonemic analysis rather than an Arabic-oriented orthographic one. These, I agree, should be rejected, and people should be advised how to encode them as sequences of existing characters. (Even though this sometimes means that the "spelling" of a single phoneme may vary depending on its position in the word.) The combination of a DAMMA with WAW, for example, or KASRA with YEH (these are also used in Urdu for specific vowel sounds - and they acquire a preceding ALEF when they occur word-initial), should be treated this way. These are cases where the vowel mark is indeed functioning as a vowel;the base consonant is functioning as a carrier/seat for it; and the particular orthography grants a specific phonemic interpretation to the combination. AFAIR, this pattern is normally used to create more vowel sounds than the "core" Arabic script provides. As the resulting multigraph "letters" represent vowels, the need to ADD vowel diacritics to them does not arise. But the second situation is where a mark is used as part of a new _consonant_, and the result functions within the pattern of the script just likethe existing consonants - in particular, it may act as the seat for a vowel diacritic. This is clearly how the "nukta" (dot) combinations work, but it also applies to the SMALL TAH. I would suggest that it also appliesto the HAMZA-like mark on U+0681 and U+076C, and am not convinced those should have been decomposed. In general it seems unlikely that the "core"Arabic vowels will get used in this way, because this would lead to confusion with their use as vowel marks. But I've seen at least one exception: some writers use a mark that arguably looks like SUKUN to replace the normal dot in NOON, in order to represent a retroflexed /n/. (Others writethis as NOON WITH SMALL TAH, or RNOON as I think the Standard calls it.)I believe it would be _wrong_ to tell people to encode that usage as NOON GHUNNA + SUKUN, because the little ring mark is not functioning at all as a vowel-class diacritic, it is comparable to the dots inherent in the standard letters. (Whether it should be encoded at all - as I don't thinkit is well-established anywhere - or treated as a glyph variant of RNOON, is a separate question.) But if someone does come with a proposal for NOON WITH RING ABOVE (note that this is not the existing NOON WITH RING, which has a ring _below_), it should be considered on the merits of its evidence for the letter, and _not_ rejected as an already-representable sequence. In particular, users of this NOON WITH RING ABOVE will want it to carry vowel diacritics; this is a clue that it should not itself be encoded _using_ a vowel(-like) diacritic. (I know there's no fundamental Unicode reason this couldn't be done, but I believe it would unnecessarily violate the behavior pattern of the Arabic script; while things are admittedly not completely clean and logical as they stand, we should strive to maintain the script's own internal patterns as far as possible when extending it.) JK