Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Gautam Sengupta (gsghyd@yahoo.com)
Date: Sat Oct 11 2003 - 06:37:39 CST

Next message: Peter Kirk: "Re: Bangla: [ZWJ], [VIRAMA] and CV sequences"
Previous message: Gautam Sengupta: "RE: Bangla: [ZWJ], [VIRAMA] and CV sequences"
In reply to: Peter Kirk: "Re: Bangla: [ZWJ], [VIRAMA] and CV sequences"
Next in thread: Peter Kirk: "Re: Bangla: [ZWJ], [VIRAMA] and CV sequences"
Reply: Peter Kirk: "Re: Bangla: [ZWJ], [VIRAMA] and CV sequences"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

--- Peter Kirk <peterkirk@qaya.org> wrote:
> On 09/10/2003 21:22, Gautam Sengupta wrote:
>
> > ...
> >
> > Yes, but not just programmers who are concerned
> with how a Unicode
> > text should be encoded, but also those who are
> going to have to
> > process these texts for various purposes. Let us
> first introduce a
> > small notational convention and then consider a
> rather minor example.
> >
> > Let the lowercase vowels henceforth denote
> *combining* vowels. In
> > Bangla K+R+i and J+aa+I mean "I do" and "I go"
> respectively. Given
> > these two forms as input, a morphological analyzer
> should ideally
> > yield the following analyses: KRi = KR<VIRAMA> +
> I, JaaI = Jaa + I. (I
> > am assuming orthographic - not phonemic/phonetic -
> input-output). In
> > other words, the analyzer would have to insert an
> explicit virama
> > after KR and somehow recognize the final in
> KRi as .
> >
> > Now let's consider the same pair of inputs in *my*
> representation.
> > They would be K+R+VIRAMA+I and J+VIRAMA+AA+I. All
> that the
> > morphological analyzer would have to do is chop
> off the rightmost .
> > The leftovers are exactly what we need: K+R+VIRAMA
> and J+VIRAMA+AA.
> > Isn't it amazing how evidence from diverse fields
> of inquiry seem to
> > converge on the *correct* solution?
> > >
> > > I hope this makes sense...
> >
> > -Gautam
> >
>
> It would surely be trivial for any morphological
> analyser to understand
> i as a ligature or contraction of <VIRAMA, I>, split
> it into the
> sequence, and then analyse the version with the
> sequence. Any
> morphological analyser is going to have to deal with
> ligatures and
> contractions. It could be programmed as a
> morphophonemic contraction,
> even if that is not technically linguistically
> correct.

[Gautam]: I did hedge my claim by saying that I was
going to cite a rather minor example. But why would I
want to do this extra bit of computing - however
trivial - when I could have avoided it by adopting a
more "appropriate" encoding in the first place? After
all, what I am suggesting is that the VIRAMA model
once adopted ought to have been implemented in full.
Is there any particular reason why it should be
adopted for CC but not for CV sequences?

Encoding /ki/ as <K> (using lowercase vowels to
denote combining forms and letters within slashes to
denote phonemes rather than characters) is also
semantically inappropriate. <K> stands for /ka/ not
/k/, and being a combining form of simply
stands for /i/. So <K> should stand for /kai/
rather than /ki/ unless a VIRAMA is inserted between
the <K> and the to remove the default inherent
vowel /a/ from <K>.

I hope this makes sense. Best, Gautam.

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

Next message: Peter Kirk: "Re: Bangla: [ZWJ], [VIRAMA] and CV sequences"
Previous message: Gautam Sengupta: "RE: Bangla: [ZWJ], [VIRAMA] and CV sequences"
In reply to: Peter Kirk: "Re: Bangla: [ZWJ], [VIRAMA] and CV sequences"
Next in thread: Peter Kirk: "Re: Bangla: [ZWJ], [VIRAMA] and CV sequences"
Reply: Peter Kirk: "Re: Bangla: [ZWJ], [VIRAMA] and CV sequences"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST