P1: Script Specific Danda and Double Danda

Last updated: November 6, 2004

1.  Problem
2.  Discussion
3.  Discussion
Document History

1. Problem

The danda and double danda are used in a number of scripts, not just in Devanagari.

2. Discussion

There is no question that other scripts use danda and double danda. The question is whether there should be encoded characters for each script, or the existing characters in the Devanagari block (U+0964 । DEVANAGARI DANDA and U+0965 ॥ DEVANAGARI DOUBLE DANDA) should be considered shared across the scripts.

TDIL proposes to encode script specific characters:

At this point, there is no consensus about the best course of action. Below are the arguments in favor and against the encoding the script specific characters.

Arguments in favor of encoding script specific characters

F1: this pattern has already been followed for other scripts, such as Myanmar and Hanunoo. which have their own danda. It has also been followed in the "core" Indic scripts for characters such as U+0950 ॐ DEVANAGARI OM / U+0AD0 ૐ GUJARATI OM.

Rebuttal:

F2: the typographic behaviour of the danda and the double danda changes across the scripts. Not only are the shapes somewhat different, but the positioning is also different.

Rebuttal: in a font that supports a single script, this is not a problem at all: whether the glyphs are mapped from the Devanagari block or the Bengali block has no impact. Font technologies which support mutliple scripts also support some mechanism by which the same code point can result in different glyphs, possibly with different layout behavior.

F3: in multi-script text, handling an occurrence of a danda between characters of different scripts (not an uncommon occurrence) requires some care, so that it is rendered using the same font as the text it belongs to.

Rebuttal: this is not that difficult: the danda goes with the text that is before it. And anyway, the problem is not specific to the danda, it also applies to other punctuation marks (comma, period, etc), so a solution, which is applicable to the danda, has to exist anyway.

F4: users are often confused when they don't see the danda and double danda in the code chart for the Bengali (or Gurmukhi, ...) block.

Rebuttal: for most users, the code charts are not the first tool they do or should use. The layout of their keyboard is much more relevant to them, and it will show the danda and the double danda. We can assume that users who do need to dig in the code charts are also willing to read, e.g., chapter 9.

F5: users can be mislead when they look at the Devanagari code chart, and see "DEVEVANAGARI DANDA"; they can easily interpret the name as restricting the use of the character to Devanagari.

Rebuttal: this can easily be addressed by an annotation of the character, right next to the name, such as "not script specific despite its name, also use with the other Indic scripts.", as well as text in chapter 9.

Arguments against

A1: this pattern (shared characters) is already being followed for other punctuation characters which are used across the scripts: e.g. period, comma, question mark,

Rebuttal:

A2: the shared nature of U+0964 and U+0965 has already been recognized, in data and in implementations. Thus, encoding script specific characters would amount to a disunification, will the problems this usually causes. Even if there is today a small amount of data that would suffer from disunification, because of fairly long time it takes for the standard process, the amount of data with problematic representation would be significant by the time the new characters can be used.

Rebuttal: Unicode will be used for centuries. Ten years of Devanagari-encoded dandas in Oriya text is of little consequence. The disunification costs for this fix are insignificant compared to the benefits.

3. Discussion

First draft of a paper for the UTC.


Document History

RevisionDateComments
1November 6, 2004

Added a draft for a UTC paper

1August 31, 2004

Initial version