Re: [indic] Unicode Processing Requirements for Tamil

From: Christopher Fynn (cfynn@gmx.net)
Date: Fri Sep 02 2005 - 05:15:46 CDT

  • Next message: N. Ganesan: "Re: [indic] Unicode Processing Requirements for Tamil"

    Peter

    It might be worth looking at a solution to this from the POV of Indic
    scripts in general - as the use of this kind of super and subscribed
    (combining mark) digits is probably not confined to Tamil.

    I know that Tibetan has superscribed and subscribed digits which occur
    directly above and below text - and have many examples of these in both
    old documents and contemporary publications.

    I've also seen a detailed document on various kinds of "Vedic Accents" -
    and recall that one set of these characters used digits like this rather
    than other kinds of mark.

    Rather than ending up encoding a set of such characters for each Indic
    script, perhaps there could be a single set of combining superscribed
    digits and a single set of subscribed combining digits whose appearance
    would be contextually dependent on the script in which they occurred.

    - Chris

    Peter Constable wrote:
    >>From: indic-bounce@unicode.org [mailto:indic-bounce@unicode.org] On Behalf Of
    >>Richard Wordingham
    >
    >
    >
    >>What should one do to get superscript (and ideally also subscript) digits
    >>supported in Tamil text? Section 9.6 Paragraph 2 of the Unicode Standard
    >>(from 4.0) says...
    >
    >
    > How obvious! Latin digits are part of the Tamil script. How could we have missed that?
    >
    >
    >
    >>However, combinations such as பெ⁴ௗ /bhau/ U+0BAA U+0BC6 U+2074 U+0BD7
    >>and
    >>பெ₄ௗ /bhau/ U+0BAA U+0BC6 U+2084 U+0BD7 do not render properly on
    >>Windows
    >>XP - the dotted circle appears before the final element of the compound
    >>vowel.
    >
    >
    > Sure, because (unless you happen to notice this bit of text buried in the standard), the Latin superscript digits are treated as *not* being part of the same script run, and so the cluster is broken, etc.
    >
    >
    >
    >>How would you recommend the Unicode Standard be strengthened so that
    >>Microsoft feels obliged to support the superscipts and subscripts in
    >>combination with non-conjoined follwoing vowels?
    >
    >
    > I don't think making the Standard stronger is an issue here. It's more a matter of users identifying a need, and the intended behaviour being clear to implementers.
    >
    > On the first point, you have now brought this to our attention, though given that users have been working with our implementation for Tamil ever since the Windows 2000 beta (six? years) and nobody has mentioned this until it is brought up now by (IIUC) a casual user of Tamil, it's not obvious to me that supporting this should be a particular priority. I'd want to know that regular users of Tamil are impacted significantly.
    >
    > On the second point, I'd want to see samples of this shown in running text so that I can see how its really used. And then there's the matter of encoded representation, which the Standard really doesn't clarify. You suggested sequences of the form
    >
    > < 0BAA, 0BC6, 2074, 0BD7 >
    >
    > i.e.
    >
    > < cons, pre-matra, sup_digit, post-matra >
    >
    > But it seems to me that should really be
    >
    > < cons, sup_digit, matras... >
    >
    > There's also the question of how many of the digits are needed, but I gather it's just 2, 3 and 4 (to fill out the four-way contrast for a given point of articulation).
    >
    >
    >
    >>I think mention of subscripts should be added,
    >
    >
    > Stop right there. If this is your invention, I'm not interested. Provide evidence of a user community before you ask for subscripts.
    >
    > So, the long and short as far as MS is concerned is (i) we're aware of a potential need, (ii) we've nothing to indicate that there's much user demand and that this needs to be a priority, and (iii) clarification of the encoding spec would be needed before we could consider any change.
    >
    >
    >
    > Peter Constable
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Sep 02 2005 - 05:23:33 CDT