RE: Normalization in Bengali

From: Michael Maxwell (mmaxwell@casl.umd.edu)
Date: Wed Nov 22 2006 - 21:41:35 CST

  • Next message: James Kass: "RE: Normalization in Bengali"

    Peter Constable wrote:
    > The upshot of 09C1 and 0981 having canonical combining class = 0
    > is that each differently-ordered sequence involving these
    > characters is in a distinct equivalence classes -- i.e.,
    > the sequences are not considered equivalent.

    OK, so let me return to my original question, slightly rephrased: what order of <DependentVowel><Chandrabindru> should text be using? My intuition tells me that it should be the Vowel-CAN order, not *CAN-Vowel. I guess I'll attribute that intuition to my linguistic intuitions that the more fundamental part--the vowel--should come before the less fundamental part--the nasalization on the vowel. (Note that there is no phonological sequence here; the vowel and its nasalization are articulated more or less simultaneously.)

    Experimenting with various consonants and vowels + Chandrabindu leads me to the same conclusion, since only this order works consistently with the Microsoft Vrinda font. (Sometimes both orders work OK, as they did with my original example.) So whoever built that font must have thought like I did :-). (The Microsoft Arial Unicode font renders some instances of C+V+CAN correctly, but not all. It renders most, if not all, C+CAN+V sequences incorrectly. I have not tried other fonts, nor have I tried this under other OSs.)

    This ordering issue arose from a difference of opinion over the "correct" ordering. The other person, who thought that the C+CAN+V order was "correct", reasoned on the basis of the fact that the Chandrabindu always appeared on top of the consonant, whereas the vowel's position was variable--I guess the idea was that the vowel was therefore more loosely attached to the consonant. (The other person might be able to state their position better.)

    If the "correct" order is not determined by the canonical combining class, then what in the Unicode standard _does_ tell me the correct/ preferred order? It obviously makes a difference in font "implementations", so I'm assuming the standard must say somewhere. Where is it?

       Mike Maxwell
       CASL/ U Md



    This archive was generated by hypermail 2.1.5 : Wed Nov 22 2006 - 21:44:03 CST