Re: Printing and Displaying Dependent Vowels

From: jameskass@att.net
Date: Sun Mar 28 2004 - 21:35:09 EST

  • Next message: Marion Gunn: "Re: What is the principle?"

    C J Fynn responded to John Hudson,

    > If someone wants this, isn't it possible to put a specific lookup in the font
    > so that any dependant vowel following a space character renders as a spacing
    > (stand-alone) dependant vowel? Surely a specific lookup should overide it being
    > displayed on a dotted circle by default.

    Has anyone tried this? Would the space glyph U+0020 be expected to trigger
    a look-up in the Tamil GSUB table as if it were a Tamil base character?

    The reason that I haven't tried this is because, in the OpenType look-ups here
    for the "re-ordrant" vowel signs of Tamil, the vowel sign is "INPUT1" and the
    base letter is "INPUT2". This is because the rendering engine has already
    re-ordered the character string before this look-up is performed. It doesn't
    seem likely that a rendering engine would re-order a vowel sign before a space.
    It could be tested both ways, I suppose...

    This seems to be OT for this list, but, here it is, and it will probably keep
    popping up from time to time unless clarified.

    I can only make inferences and suppositions based on observation of the
    behavior and reasoning behind the behavior of the rendering engine used
    here, Microsoft's "Uniscribe". People who know all about this do follow
    this list, so they're free to offer corrections.

    <inference and supposition>

    Uniscribe inserts the dotted circle into the display for complex scripts in
    order to give a visual indication of an encoding or spelling error. This seems
    quite useful whether text is being entered or merely displayed.

    Allowing dependent vowels to follow the space character breaks this utility.
    In other words, somebody could write a Tamil word in a web page starting
    with the E-vowel-sign (U+0BC6), and there'd be no indication that this is
    improper, either to the author or the visitor.

    Someone searching for that word on that page wouldn't find it, and so on.

    Maybe some kind of spell-checker should be used by the original author, but,
    there seems to be no way to assure that spell-checking was performed by the
    author of any web page one visits.

    It is the very appearance of that dotted circle unexpectedly in our texts which
    alerts us to the fact that we have made a mistake. That dotted circle jumps out
    of the page into our vision exclaiming, "Hey, I'm wrong! I'm so wrong, don't
    even bother running your spell-checker on me!" This is the basis upon which
    Uniscribe renders text which includes dependent vowel signs, not just for Tamil,
    but for the other so-called "complex" scripts, too. The dotted circle plus the
    matra is the default rendering for combining marks *in isolation*. Uniscribe
    seems to rightly treat a vowel sign following a space as being in isolation, and,
    how could it do otherwise? What goes for the space character also seems to
    go for any other character which is not a valid character *within the Unicode
    range*. Again, how could it be otherwise. If the first character in a string
    isn't a Tamil character, there's no reason for the renderer to consult the Tamil
    OpenType tables in a font. If it did, my gosh, imagine all the pointless look-ups
    just to display a page which was, for example, mostly Chinese with a few Tamil
    phrases.

    <end of supposition and inference>

    The good folks engineering the Uniscribe have been most responsive to all kinds
    of special requests and pointers related to complex script shaping.

    I think asking them to break the existing mechanism in order to support
    vowel signs on spaces asks too much, though.

    People generating texts for educational purposes will always have special needs.
    So, they'll always need to make special effort to get special effects. Workarounds
    concerning the original question have already been suggested.

    If this is treated as a Unicode issue rather than a display issue, then one solution
    would be for someone to propose a new character, (back on topic a little bit)
    COMBINING DOTTED CIRCLE FOR COMBINING MARKS.
    Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine
    could be changed to insert this new character. Then, these updated rendering
    engines could be distributed and font developers could add the new characters
    to fonts and distribute updated fonts. This might just take a while, but it
    wouldn't be too hard to find examples of the character in actual text use to
    accompany the proposal...

    "If it ain't broke, don't fix it." So, is it 'broke'?

    Best regards,

    James Kass



    This archive was generated by hypermail 2.1.5 : Sun Mar 28 2004 - 22:22:09 EST