| 1. | Problem |
| 2. | Discussion |
| 3. | Consensus (tentative) |
| 4. | Discussion |
| 5. | Discussion |
Sindhi writes implosives by using decorated forms of Devanagari GA, JA, DDA, and BA, where the decoration is a line under.
TDIL proposes to encode specific characters:
p979 DEVANAGARI CONSONANTThese forms are indeed used in writing Sindhi (although no evidence has been presented so far). This is an instance of the general situation where the adaptation of a script to write a language creates new forms by adding some mark (e.g. accent, nukta) to existing base forms. From a character encoding point of view, there are two approaches:
encode one character for each combination of a base form and the mark as proposed by TDIL, without canonical nor compatibility decomposition
represent those by a combining sequence, made of the base letter and a combining character for the mark. This approach does not negate the distinct phonemic status of the base character alone and the combination.
Because the combining sequence approach allows for new combinations to be put to use as needed without requiring additional encoding of characters (e.g. if it is discovered that some language not yet considered uses a new combination of existing parts), it is the preferred approach in general.
There is no evidence that the combining sequence approach causes any problem of implementation, nor is the model a departure for Devanagari (witness the nukta).
The Sindhi implosives will be represented by combining sequences.
The next question is which combining character should be used. We have at least four possibilities:
Using the ANUDATTA could be problematic, e.g. in text mixing Sindhi and Sanskrit. Using the same combining character in both cases could make searching much more complicated. On the other hand, Unicode has a long tradition of encoding the marks by their shapes rather than by their function.
Another consideration in choice of a combining character is its properties, and in particular its combining class. Here are the combining classes of relevant characters:
| 7 | U+093C ◌़ DEVANAGARI SIGN NUKTA |
| 9 | U+094D ◌् DEVANAGARI SIGN VIRAMA |
| 220 | U+0331 ◌̱ COMBINING MACRON BELOW |
| 220 | U+0332 ◌̲ COMBINING LOW LINE |
| 220 | U+0952 ◌॒ DEVANAGARI STRESS SIGN ANUDATTA |
One implication of combining classes is that in a normalized representation, a sequence of occurrences of the characters above will be reordered such that all the nukta occurrences come first, all the virama occurrences come next, and all the other occurrences come last (keeping their relative order).
Since the line under we are seeking behaves very much like a nukta, it would seem desirable for this character to have the same combining class, 7, which strongly suggests the encoding of a new character.
| Revision | Date | Comments |
| August 31, 2004 | Initial version |