Public Review Issue #59

Disunification of Dandas

The UTC is considering the question of disunifying the characters U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA from their counterparts in several other Indic scripts. Feedback on this issue, for or against the disunification, is being sought. For example, evidence for disunification would be data showing that a Bengali danda usually has a different shape than the Devanagari danda. Evidence against disunification would be data showing that a Bengali danda usually has the same shape as the Devanagari danda.

The UTC would also particularly like to have specific information on the impact of making the decision either way. Please accompany your feedback with information on the source of your data and indicate the extent and nature of your experience with Indian data processing.

The Current State of the Standard

Currently the standard recommends the use of U+0964 and U+0965 when a danda or double danda is to be used with others of the main scripts of India. In particular, the list includes the Indic scripts that have been encoded in Unicode since Version 1.0: Devanagari (of course), Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam. Sinhala also has no explicit danda or double danda separately encoded. The Indian script, Syloti Nagri, which will be included in Unicode in Version 4.1, does not have an explicit danda or double danda separately encoded.

The large majority of the punctuation symbols currently encoded are common to multiple scripts. Unicode 4.1.0 will contain 439 such general punctuation symbols, compared to a handful of script-specific punctuation.

Several characters analogous to the danda and double danda are encoded for other scripts. A list is provided at the end of this document.

The Case for Disunification

Some users feel that it is important to retain predictably the stylistic differences in dandas according to the scripts they are used with. Given the current recommendations for use of U+0964 DEVANAGARI DANDA and U+0965 DEVANAGARI DOUBLE DANDA in plain text (that is, without font or language information attached to the text), dandas associated with different scripts in a piece of text may be limited to having a single, uniform glyph in display, based on the angled "stem" which the Devanagari danda shares with its letters KA, PA, and so on. While some font technologies are able to provide language-dependent glyph selection, language information, like specific font information, is not always available or interchangable, or carried with the text.

The current recommendation may confuse people looking at character names, since for scripts other than Devanagarai they see "DEVANAGARI DANDA" in a non-Devanagari context. Originally naming the characters simply DANDA and DOUBLE DANDA and encoding them in general punctuation might have avoided that confusion, but such a change is not currently possible.

Furthermore, there is an inconsistency between the way danda and double danda are encoded in the Indian scripts (unified) and the way they are encoded in other Brahmi-derived scripts and in Kharoshthi (not unified). This inconsistency in treatment is also confusing to users of these scripts.

The Case Against Disunification

The current recommendation for use of danda and double danda in Indian scripts has been in place for over a decade now. Changing it would require changes in existing implementations and might invalidate some existing data. Stability with known exceptions is preferable to instability in an encoding to fix it after the fact.

There are known techniques for representing multiple glyph forms in fonts and picking them appropriately for different script contexts, so that the different forms of dandas in Devanagari and other Indian scripts are not an insuperable problem, even in unstyled plain text. There are solutions in place which deal with this issue. In any case, the ordinary forms of the danda and double danda (simply vertical lines) are similar enough in the Indian scripts proper that even non-differentiated glyphs are "good enough" for plain text purposes, and any other differences can be considered issues of font style in fine typography. In fact, introducing distinct dandas for each script might result in the same kind of problem that introducing distinct commas and full stops for each script would.

Use of danda and double danda in scripts other than Devanagari is not all that usual in modern typography for Indian scripts, where they are mostly replaced by use of full stop and similar Western punctuation -- though all of them retain the dandas in representations of classical poetry. It isn't a problem for most current usage of the scripts.

Analogues in Other Scripts

The following analogues to danda and double danda in other scripts are encoded in the standard. Their typical visual appearances differ significantly from the Devanagari danda.

The Thai script has its own historical analogue encoded, U+0E5A THAI CHARACTER ANGKHANKHU.

The Tibetan script has its own historical analogues encoded, U+0F0D TIBETAN MARK SHAD and U+0F0E TIBETAN MARK NYIS SHAD.

The Myanmar script has its own historical analogues encoded, U+104A MYANMAR SIGN LITTLE SECTION and U+104B MYANMAR SIGN SECTION.

The Khmer script has its own historical analogues encoded, U+17D4 KHMER SIGN KHAN and U+17D5 KHMER SIGN BARIYOOSAN.

The Hanunoo script has its own historical analogues encoded, U+1735 PHILIPPINE SINGLE PUNCTUATION and U+1736 PHILIPPINE DOUBLE PUNCTUATION. These Philippine punctuation marks are explicitly intended for usage with any of the other Philippine scripts, namely, Tagalog, Buhid, and Tagbanwa.

The Kharoshthi script, which will be included in Unicode in Version 4.1, has its own historical analogues encoded, U+10A56 KHAROSHTHI PUNCTUATION DANDA and U+10A57 KHAROSHTHI PUNCTUATION DOUBLE DANDA.

Proposals in progress for Phags-pa, Cham, and Balinese also include analogous characters.