Re: Devanagari Letter Short A

From: Philippe Verdy (
Date: Mon Feb 16 2004 - 06:16:31 EST

  • Next message: rajesh chandrakar: "Re: Install regional language setting options of system through program"

    My understanding of the Indian scripts coded in Unicode, is that the mapping
    from ISCII to Unicode is not straightforward one-to-one, because ISCII uses a
    contextual encoding for characters (allowing shifts between several scripts) and
    some rich-text features.

    The ISCII character model is not exactly the same as the Unicode character
    model, even though there was an attempt to make this mapping as simple as
    possible by allocating the Unicode code points for each individual
    ISCII-supported script in the same relative order, leaving gaps in the
    Unicode-encoded scripts for ISCII characters that are not used in one specific

    The good reference for how Indian scripts are coded in Unicode is Chapter 9 of
    the Unicode 4 reference:
    In summary with Unicode, the model for Devenagari:
    - uses consonnantal letters with an implied (default) vowel A, modified by the
    next coded dependant vowel sign (matra) that create graphic conjuncts with the
    consonnant, or
    - uses half-forms of consonnants to drop the implied vowel in initial
    consonnants, or
    - uses a virama (halant) U+094D, to mark other omissions of the implied vowel on
    dead consonnant letters (most often on final consonnants, but this occurs as
    well on initial or medial consonnants), by removing the final stem of the full
    (live) consonnant that is normally used to depict also a phonetic syllable
    boundary with a necessary vowel. So the virama allows creating conjuncts with
    other following dead consonnants or live consonnants, and normally attaches both
    consonnant letters into the same syllable or conjunct.
    - in some cases, the omission of the implied dependant vowel must not create a
    ligated conjunct, so the virama still needs to represent the omission of the
    vowel without creating a conjunct that would break the perceived phonetic, and a
    ZWNJ is used between the dead consonnant (consonnant letter+virama) and the next
    live consonnant.

    There's a U+0905 pseudo-consonnant /a/ which is used in absence of a phonetic
    consonnant, but it follows the same encoding rule as other consonnant letters
    /*a/, i.e. coding another isolated vowel requires coding /a/ before the vowel
    sign (matra). This encodes approximately the same thing as isolated vowels,
    except that the intended rendering is different.

    U+0904 DEVANAGARI LETTER SHORT A is used only for the case of an independant
    vowel. It can be "viewed" as a conjunct of the independant vowel U+0905
    DEVANAGARI LETTER A and the dependant vowel sign U+0946 DEVANAGARI VOWEL SIGN
    SHORT E (noted "for transcribing Dravidian vowels" in the Unicode charts). I
    don't know why this is not documented, because I can find various sources that
    use <U+0904> or <U+0905,U+0946> which have exactly the same rendering and
    probably the same meaning and usage. I think that U+0946 was added in ISCII 1991
    but was absent from ISCII 1988 (verify, I don't have the ISCII 1988 reference
    document), so U+0904 has survived just to allow a mostly one-to-one mapping with
    ISCII 1988. But the addition of U+0946

    May be I'm wrong here, and there's some reasons for this choice. there's no
    canonical or compatibility equivalence defined between <U+0904> and
    <U+0905,U+0946> (I think it's too late to define it: ISCII 1988 has been used
    consistently before, and the Unicode stability policy forbids now defining now
    new equivalences between them).

    ----- Original Message -----
    From: "Ernest Cline" <>
    To: "Unicode List" <>
    Sent: Monday, February 16, 2004 6:28 AM
    Subject: Devanagari Letter Short A

    > I've been trying to make sense of the Indian scripts, but am
    > having one small difficulty. I can't seem to find the ISCII 1991
    > equivalent for U+0904 (DEVANAGARI LETTER SHORT A).
    > Is this a character that is part of the set accessed by the
    > extended code (xF0) or was this part of the ISCII 1988
    > standard that did not survive the changes to ISCII 1991?
    > Alternatively, does ISCII encode this as xA4 + xE0 as this
    > would seem to generate the proper glyph even tho it
    > violates the syllable grammar given in Section 8 of ISCII?
    > Or even more alternatively, am I just missing something
    > that should be obvious, but which for some reason I can't see?
    > Even with the slight differences in the naming conventions
    > between ISCII and Unicode, I don't seem to be misplacing
    > any of the other vowels or consonants.
    > Ernest Cline

    This archive was generated by hypermail 2.1.5 : Mon Feb 16 2004 - 07:05:46 EST