P7: Sindhi implosives

Last updated: August 31, 2004

1.  Problem
2.  Discussion
3.  Consensus (tentative)
4.  Discussion
5.  Discussion
Document History

1. Problem

Sindhi writes implosives by using decorated forms of Devanagari GA, JA, DDA, and BA, where the decoration is a line under.

2. Discussion

TDIL proposes to encode specific characters:

  p979 DEVANAGARI CONSONANT
   * used for Sindhi implosive placed just below the consonant
  p97A DEVANAGARI CONSONANT 
   * used for Sindhi implosive placed just below the consonant
  p97B DEVANAGARI CONSONANT 
   * used for Sindhi implosive placed just below the consonant
  p97C DEVANAGARI CONSONANT 
   * used for Sindhi implosive placed just below the consonant

These forms are indeed used in writing Sindhi (although no evidence has been presented so far). This is an instance of the general situation where the adaptation of a script to write a language creates new forms by adding some mark (e.g. accent, nukta) to existing base forms. From a character encoding point of view, there are two approaches:

Because the combining sequence approach allows for new combinations to be put to use as needed without requiring additional encoding of characters (e.g. if it is discovered that some language not yet considered uses a new combination of existing parts), it is the preferred approach in general.

There is no evidence that the combining sequence approach causes any problem of implementation, nor is the model a departure for Devanagari (witness the nukta).

3. Consensus (tentative)

The Sindhi implosives will be represented by combining sequences.

4. Discussion

The next question is which combining character should be used. We have at least four possibilities:

Using the ANUDATTA could be problematic, e.g. in text mixing Sindhi and Sanskrit. Using the same combining character in both cases could make searching much more complicated. On the other hand, Unicode has a long tradition of encoding the marks by their shapes rather than by their function.

5. Discussion

Another consideration in choice of a combining character is its properties, and in particular its combining class. Here are the combining classes of relevant characters:

7 U+093C ◌़ DEVANAGARI SIGN NUKTA
9 U+094D ◌् DEVANAGARI SIGN VIRAMA
220 U+0331 ◌̱ COMBINING MACRON BELOW
220 U+0332 ◌̲ COMBINING LOW LINE
220 U+0952 ◌॒ DEVANAGARI STRESS SIGN ANUDATTA

One implication of combining classes is that in a normalized representation, a sequence of occurrences of the characters above will be reordered such that all the nukta occurrences come first, all the virama occurrences come next, and all the other occurrences come last (keeping their relative order).

Since the line under we are seeking behaves very much like a nukta, it would seem desirable for this character to have the same combining class, 7, which strongly suggests the encoding of a new character.


Document History

RevisionDateComments
1August 31, 2004

Initial version