Re: Devanagari question

From: Marco Cimarosti (
Date: Tue Nov 14 2000 - 09:47:23 EST

Antoine Leca wrote:
> Marco Cimarosti wrote:
> >
> > I think that the original idea behind having combining
> marks in Unicode was
> > that *any* combination of base + diacritic should be permitted,
> The fact that it is permitted (as I said, they "are not prohibited")
> does not per se give them any sense...
> This was my point, but I was not clear enough.

Your point was clear, and your statement is certainly true: there are
millions possible combinations of base-character + accent and, clearly, most
of them are meaningless.

But my point was: not even Mr. Ethnologue himself knows exactly *which*
combinations are meaningful, in all orthographic system. And, clearly, no
one can figure out which combinations may become meaningful in the *future*
-- e.g. when a previously unwritten language gets its orthography, or when
the spelling of an already written language gets changed.

So, it makes sense -- and is probably economically worth -- to have a
generalized mechanism to render virtually any combination that could arise.

> > and be handled decently by rendering engines.
> The question here is the meaning of "decently".

Sorry for my sloppy expression. A better term would have been "readably".
I.e., good enough to be accepted and understood by a human reader.

> I beg your pardon, but as the programmer of a rendering
> engine, I cannot
> agree that I should spend hours and days, and furthermore
> adding megabytes
> of code, to render "decently" combinations like digits + accents (by
> decently, I mean I should check if the glyph for the digit
> have ascender
> above x-height, or being of narrower width, and then adjust
> the position of
> the diacritic accordingly; similarly, adjusting the descender
> position of the
> Nagari virama according to the descender depth of a preceding
> "g" or "j" or "y".)

If you put it this way, I agree that it definitely doesn't make sense to
spend a single minute of your life, or a single byte in your computer, to
support crazy combinations like <Latin y + Devanagari virama>.

But try and restate the same thing with different words, and it might sound
quite different.

Call it "providing a general solution to a common problem", and you see
that, by implementing such a solution *once for all*, you could end up
spending *less* time and resources than is required to develop (and fix, and
extend, and redesign, and explain...) an ad-hoc solution for every (class
of) combination(s).

Moreover, Mark Davis already commented about some exaggeration here, about
the complexity of the task and the memory requirements.

> At the contrary, I believe that when a combination is not
> expected, the
> renderer should have a very basic and straightforward
> behaviour, and just
> "print" the default glyphs in order, with overstriking when
> the second glyph is a combining mark.

Overstriking is not that bad, as a first approximation!

The next step is positioning the diacritic sign more or less outward, in
order not to collide with the base sign. When you have this, you have a
generalized solution -- any further enhancement is more esthetic than

Mark reminded us that these two techniques are explicitly described in the
Unicode standard as possible implementation strategies.

Overstriking is clearly a poor solution, but viable in many cases.
Contextual positioning of accents is more complex, but it certainly doesn't
require years of development or megabytes of RAM!

> Doing something more complex, in addition to be IMHO
> a complete lost of time for both the programmer and the users
> (to load unusued code), is also likely to give some users the
> idea that using some weird
> combinations are handled this ("clever") way everywhere, thus
> leading to chaos when the datas will be brought elsewhere.

This is where the misconception sits, IMHO. If you spend time to come up a
*general* solution, it is because you will generally use it! However complex
it might have been to develop it, it was worth, because you use it all the

But taking the burden of developing a general solution and only using it for
*weird* cases would indeed be a loss of time -- and an illogical behavior
too (pretty like using different forks to eat meat and fish:-).

What I mean is that, once this is in place, it should be used also (and
primarily) for common combinations like: , , , , , , , , , , , ,
, , , , etc.

So, the time that is spent in designing the rendering engine will generously
be repaid by the time saved in designing fonts.

Only in a few exceptional cases fonts may need manually tuned "accented
glyph", for special combinations: , , , , etc. But this is not unique to
accents: even perfectly "spacing" letters have ligatures for special cases:
fi, fl, ff, ffi, ffl, etc.

> > If font designers and d. engines implementers insist in the
> idea that an
> > "accented letter" may be rendered only if an ad-hoc glyph has been
> > anticipated in the font, many minority languages will never
> have a chance of
> > being supported at a reasonable cost.
> I never say (nor I hope I implied) such an idea.

I didn't mean you mean that. (I didn't know you implemented a rendering
engine, and I don't know how it works, so I was not referring to you).

I was talking about a certain bias towards precomposed character and glyphs
that *does* exist in the industry.

> Now, insisting that any renderer should align properly any
> diacritic on the
> top (or bottom) middle of the I, M and W glyph, will have for
> net result that nobody will never be able to create any renderer...

As far as the basic requirement of readability is met, I see no problem if
different solutions have different levels of sophistication.

If an application displays accents on "w" slightly too much to the right, I
would call it a perfectly readable implementation (although, if I was a
Welshman, I'd probably consider it very ugly).

> > Less common combinations, used in less known languages, may
> get along with a
> > less-than-perfect rendering -- but *no* rendering at all is
> not acceptable,
> Where anyone stated such an idea?

You mean the idea that a total lack of rendering is unacceptable? Or that a
default ("less-than-perfect") rendering can be acceptable for very uncommon
combinations? Both ideas are mine, although I think they are common sense.

_ Marco

La mia e-mail ora: My e-mail is now:
>>> <<<
(Cambiare "" in "@") (Change "" to "@")

FREE Personalized Email at
Sign up at

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:15 EDT