From: jameskass@att.net
Date: Sun Mar 28 2004 - 21:35:09 EST
C J Fynn responded to John Hudson,
> If someone wants this, isn't it possible to put a specific lookup in the font
> so that any dependant vowel following a space character renders as a spacing
> (stand-alone) dependant vowel? Surely a specific lookup should overide it being
> displayed on a dotted circle by default.
Has anyone tried this? Would the space glyph U+0020 be expected to trigger
a look-up in the Tamil GSUB table as if it were a Tamil base character?
The reason that I haven't tried this is because, in the OpenType look-ups here
for the "re-ordrant" vowel signs of Tamil, the vowel sign is "INPUT1" and the
base letter is "INPUT2". This is because the rendering engine has already
re-ordered the character string before this look-up is performed. It doesn't
seem likely that a rendering engine would re-order a vowel sign before a space.
It could be tested both ways, I suppose...
This seems to be OT for this list, but, here it is, and it will probably keep
popping up from time to time unless clarified.
I can only make inferences and suppositions based on observation of the
behavior and reasoning behind the behavior of the rendering engine used
here, Microsoft's "Uniscribe". People who know all about this do follow
this list, so they're free to offer corrections.
<inference and supposition>
Uniscribe inserts the dotted circle into the display for complex scripts in
order to give a visual indication of an encoding or spelling error. This seems
quite useful whether text is being entered or merely displayed.
Allowing dependent vowels to follow the space character breaks this utility.
In other words, somebody could write a Tamil word in a web page starting
with the E-vowel-sign (U+0BC6), and there'd be no indication that this is
improper, either to the author or the visitor.
Someone searching for that word on that page wouldn't find it, and so on.
Maybe some kind of spell-checker should be used by the original author, but,
there seems to be no way to assure that spell-checking was performed by the
author of any web page one visits.
It is the very appearance of that dotted circle unexpectedly in our texts which
alerts us to the fact that we have made a mistake. That dotted circle jumps out
of the page into our vision exclaiming, "Hey, I'm wrong! I'm so wrong, don't
even bother running your spell-checker on me!" This is the basis upon which
Uniscribe renders text which includes dependent vowel signs, not just for Tamil,
but for the other so-called "complex" scripts, too. The dotted circle plus the
matra is the default rendering for combining marks *in isolation*. Uniscribe
seems to rightly treat a vowel sign following a space as being in isolation, and,
how could it do otherwise? What goes for the space character also seems to
go for any other character which is not a valid character *within the Unicode
range*. Again, how could it be otherwise. If the first character in a string
isn't a Tamil character, there's no reason for the renderer to consult the Tamil
OpenType tables in a font. If it did, my gosh, imagine all the pointless look-ups
just to display a page which was, for example, mostly Chinese with a few Tamil
phrases.
<end of supposition and inference>
The good folks engineering the Uniscribe have been most responsive to all kinds
of special requests and pointers related to complex script shaping.
I think asking them to break the existing mechanism in order to support
vowel signs on spaces asks too much, though.
People generating texts for educational purposes will always have special needs.
So, they'll always need to make special effort to get special effects. Workarounds
concerning the original question have already been suggested.
If this is treated as a Unicode issue rather than a display issue, then one solution
would be for someone to propose a new character, (back on topic a little bit)
COMBINING DOTTED CIRCLE FOR COMBINING MARKS.
Then, rather than inserting DOTTED CIRCLE into the display, a rendering engine
could be changed to insert this new character. Then, these updated rendering
engines could be distributed and font developers could add the new characters
to fonts and distribute updated fonts. This might just take a while, but it
wouldn't be too hard to find examples of the character in actual text use to
accompany the proposal...
"If it ain't broke, don't fix it." So, is it 'broke'?
Best regards,
James Kass
This archive was generated by hypermail 2.1.5 : Sun Mar 28 2004 - 22:22:09 EST