Re: "A Programmer's Introduction to Unicode"

From: Manish Goregaokar <manish_at_mozilla.com>
Date: Mon, 13 Mar 2017 22:57:03 -0700

Ah, it was what I thought you were talking about -- I wasn't aware they
were considered word boundaries :)

Thanks for the links!

On Mar 13, 2017 4:54 PM, "Richard Wordingham" <
richard.wordingham_at_ntlworld.com> wrote:

On Mon, 13 Mar 2017 15:26:00 -0700
Manish Goregaokar <manish_at_mozilla.com> wrote:

> Do you have examples of AA being split that way (and further reading)?
> I think I'm aware of what you're talking about, but would love to read
> more about it.

Just googling for the three words 'Sanskrit', 'sandhi' and 'resolution'
brings up plenty of papers and discussion, e.g. Hellwig's at
http://ltc.amu.edu.pl/book/papers/LRL-1.pdf and a multi-author paper at
https://www.aclweb.org/anthology/C/C16/C16-1048.pdf.

There are even technical terms for before and after. Unsplit text is
'samhita text', and text split into words is 'pada text'.

Richard.
Received on Tue Mar 14 2017 - 00:57:46 CDT

This archive was generated by hypermail 2.2.0 : Tue Mar 14 2017 - 00:57:48 CDT