Comments on draft Sinhala standard (L2/04-131 + L2/04-231)

Eric Muller, Adobe Systems Inc.
June 7, 2004

1.  Names of characters
2.  Spacing vowel signs
3.  Joiners to control shaping
4.  Typos
5.  Some questions
First, I want to congratulate you for working on clarifying the representation of Sinhala text using Unicode, and for submitting your draft for comments. I am confident that this will greatly improve the intercheangability of Sinhala text via computers.

Second, I readily admit that I am not a Sinhala speaker (nor writer), and that my comments are mostly based on analogy with other situations in Unicode; do not hesitate to correct my assumptions or biases.

1. Names of characters

N1. page 5, in the table of consonants, you use the names “ndja” and “nnda” where TUS 4.0 gives the annotations “nyja” and “nnda” in the code charts. The first is repeated in the first note, so at least that one is probably not a typo. Could you confirm that those are the names you prefer? I am confident the UTC would have no problem updating the annotations for the next version.

2. Spacing vowel signs

The amendment in L2/04-231 to use U+0020 SPACE as the base to display a spacing vowel sign is a vast improvement over the user of U+200C ‌ ZERO WIDTH NON-JOINER. However, the amendment also introduces a constrastive use of U+00A0 NO-BREAK SPACE as “virtual” base, which sounds risky.

I now believe that the whole area of displaying fragments of text (be they nonspacing marks in isolation, or constructs behaving typographically in a similar fashion, such as the superscript and subscript forms of RA in Devanagari or the repaya and the rakaransaya) is a delicate area that requires more work (see the separate submission, “Using SPACE as a base character”).

3. Joiners to control shaping

J1. page 14, section 5.6. I understand from additional discussion that forming the rakaransaya for a RAYANNA at the end of a conjunct is by far the most frequent case. There is also a great similarity with the (traditional) Malayalam RA and the Kannada RA, both in the conceptual behaviour and in the graphical appearance. For those two reasons, I would have expected that <Cons + al-lakuna + rayana> would form a rakaransaya, while <Cons + al-lakuna + ZWNJ + rayana> and <Cons + al-lakuna + ZWJ + rayana> would not. Furthermore, this would be consistent with the general notion in Indic scripts that the joiners disable the formation of conjuncts, rather than enable it. The same applies to yansaya.

J2. page 15, section 5.7. The observation here is fundamentally the same as in J1, although I understand that repaya is the exception for an initial RAYANNA (explicit al-lakuna being the norm). My guess is that right design is still to have ZWNJ and ZWJ prevent the formation of repaya, rather than the other way around.

J3. page 15, section 5.8. Same observation, but this time for ordinary conjuncts: you want C + virama + ZWJ + C to form a conjunct while there is no need for a ZWJ in the other scripts.

J4. L2/04-231, touching letters. A (probably naive) interpretation of the touching letters is that they act similarly to the half-forms of the other Indic scripts; along with the comments above, this would suggest the three-way distinction:

4. Typos

T1. page 4, 1st paragraph, 3rd line: change “syllabury” to “syllabary”. Note that Unicode actually characterizes Sinhala as an abugida rather than a syllabary; see TUS 4.0, section 6.1.

T2. page 6, note 1, 1st line: change “al-lakana” to “al-lakuna”.

T3. page 6, note 3, 3rd line: change “symbolises <glyph for U+0DBB> when by a consonant,” to “symbolises <glyph for U+0DBB> when preceded by a consonant,”

T4, page 6, note 3, the glyph for rayanna + al-lakuna has in incorrect placement of the al-lakuna. The same occurs p 15, section 5.7, 1st paragraph.

T5. page 8, table 3, row 6, column 2: the first glyph is for U+0DAD, it should be the glyph for U+0DD9 instead.

T6. page 9, 3rd paragraph, 2nd/3r line: after “and follow a consonant” add “in memory”, to make sure that we are speaking about the coded characters rather than their visual rendering.

T7. page 10, representative glyph for U+0DDA: the al-lakuna should be on the right of the dotted circle.

T8. page 11, bottom of the table: change “Concluded” to “Continued”

T9. page 14, note 2 (top of the page), second line, list of vowel sign characters: add “+” between them to clearly show the characters; end of fourth line, the last glyph is that of U+0DDC, change to the glyph for U+0DDD.

T10. page 15, 1st paragraph of section 5.7, 1st line: change “The repaya <rakaransaya glyph> represents the letter” to “The repaya <repaya glyph> represents the letter”.

T11. page 15, section 5.8, end of 4th line: the al-lakuna is incorrectly positionned.

T9. page 15, 3rd paragraph of section 5.7, 1st line: change “in words wich as” to “in words such as”.

5. Some questions

Q1. Throughout the document, you speak of consonant clusters as if the only possibility is two consonants (for example in section 5.8: “conjunct letters are represented by the sequence Cons + al-lakuna + zwnj + Cons. The second consonant may optionally be followed by a vowel sign”). Is it just because the vast majority of clusters have only two consonants, but all the situations you describe are really meant to also apply to clusters that have more than two consonants?

Q2. You mention specifically the conjuncts of the form rayanna/yayanna/yayanna, displayed as repaya and yansaya on a plain yayanna. I also notice that in this example, you display the repaya above the yansaya. In the article on Sinhala by James Gair in Daniels and Bright, there is example of a rayanna/taaluja sayanna/yayanna (page 408, second paragraph), but there the repaya is placed above the taaluja sayanna. What is in general the correct placement of the repaya? Did call out rayanna/yayanna/yayanna specifically because the placement of the repaya is different from the general case?

Q3. In the description of touching letters in L2/04-231, you seems to consider a conjunct alparapraana dayanna/mahaapraana dayanna (U+0DAF, U+0DB0) is essentially equivalent to sanyaka ddyanna (U+0DAC). Similarly with dantaja sayanna/dantaja sayanna (U+0DC3, U+0DC3) and muurdhaja sayanna. How strong is that equivalence?

