Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Peter Kirk (
Date: Fri Oct 10 2003 - 05:11:23 CST

On 09/10/2003 21:22, Gautam Sengupta wrote:

> ...
> Yes, but not just programmers who are concerned with how a Unicode
> text should be encoded, but also those who are going to have to
> process these texts for various purposes. Let us first introduce a
> small notational convention and then consider a rather minor example.
> Let the lowercase vowels henceforth denote *combining* vowels. In
> Bangla K+R+i and J+aa+I mean "I do" and "I go" respectively. Given
> these two forms as input, a morphological analyzer should ideally
> yield the following analyses: KRi = KR<VIRAMA> + I, JaaI = Jaa + I. (I
> am assuming orthographic - not phonemic/phonetic - input-output). In
> other words, the analyzer would have to insert an explicit virama
> after KR and somehow recognize the final <i> in KRi as <I>.
> Now let's consider the same pair of inputs in *my* representation.
> They would be K+R+VIRAMA+I and J+VIRAMA+AA+I. All that the
> morphological analyzer would have to do is chop off the rightmost <I>.
> The leftovers are exactly what we need: K+R+VIRAMA and J+VIRAMA+AA.
> Isn't it amazing how evidence from diverse fields of inquiry seem to
> converge on the *correct* solution?
> >
> > I hope this makes sense...
> -Gautam

It would surely be trivial for any morphological analyser to understand
i as a ligature or contraction of <VIRAMA, I>, split it into the
sequence, and then analyse the version with the sequence. Any
morphological analyser is going to have to deal with ligatures and
contractions. It could be programmed as a morphophonemic contraction,
even if that is not technically linguistically correct.

Peter Kirk (personal) (work)

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST