Re: "A Programmer's Introduction to Unicode"

From: Mark E. Shoulson <>
Date: Mon, 13 Mar 2017 20:20:25 -0400

A word ending in A *or* AA preceding a word beginning in A *or* AA will
all coalesce to a single AA in Sanskrit. That's four possibilities, and
that doesn't count a word ending in a consonant preceding a word
beginning in AA, which would be written the same. My memory is rusty,
so I should actually be looking things up, but I think these are valid

न + अगच्छत् → नागच्छत्
न + आगच्छत् → नागच्छत्

(and indeed, आगच्छत् is the upasarga आ plus अगच्छत्, so there too the A
+ AA coalesced.) I should probably find you examples for all the other
possibilities. Sanskrit external vowel sandhi is comparatively
straightforward (compared to consonant sandhi), and it frequently loses
information. A *or* AA plus I is E; A *or* AA plus U is O (you need A +
O to get AU).


On 03/13/2017 06:26 PM, Manish Goregaokar wrote:
> Do you have examples of AA being split that way (and further reading)?
> I think I'm aware of what you're talking about, but would love to read
> more about it.
> -Manish
> On Mon, Mar 13, 2017 at 2:47 PM, Richard Wordingham
> <> wrote:
>> On Mon, 13 Mar 2017 23:10:11 +0200
>> Khaled Hosny <> wrote:
>>> But there are many text operations that require access to Unicode code
>>> points. Take for example text layout, as mapping characters to glyphs
>>> and back has to operate on code points. The idea that you never need
>>> to work with code points is too simplistic.
>> There are advantages to interpreting and operating on text as though it
>> were in form NFD. However, there are still cases where one needs
>> fractions of a character, such as word boundaries in Sanskrit, though I
>> think the locations are liable to be specified in a language-specific
>> form. U+093E DEVANAGARI VOWEL SIGN AA can have a word boundary in it
>> in at least 4 ways.
>> Richard.
Received on Mon Mar 13 2017 - 19:20:49 CDT

This archive was generated by hypermail 2.2.0 : Mon Mar 13 2017 - 19:20:49 CDT