Re: dotless j

From: Peter_Constable@sil.org
Date: Mon Jul 05 1999 - 09:19:51 EDT


>How do you make a j lose its dot if you do not have a dotless j available? I
don't get it. It would seem to make sense to me to bend the rules in this case
and have a dotless j even if it is a glyph and not a character used by any
language.

The dotless j glyph can be contained in a font without requiring that there be
an entry in the cmap to access it directly, i.e. without requiring that there be
a Unicode value associated with it. Just as Unicode doesn't need to contain
every Arabic contextual form and ligature, every positional variant of Thai
diacritics, every Devanagari conjunct, every Hangul syllable, etc.

>If we really want to convince all programmers to use Unicode, we can hardly
insist that they add low level code to every single program they write to remove
the dot from the j by directly manipulating the fonts.

No more than that they all want to write code to render a bunch of other
scripts. The answer for them is general purpose rendering code (be it in the
form of libraries, system calls, whatever) that will take as input Unicode
characters, a font identifier, and a designation of the intended
language/writing system (in the general case, just the Unicode characters is not
enough) and that returns the appropriate sequence of glyphs (with positioning
info) and/or draws the glyphs on the appropriate device. (It's necessary to
identify the font in the input unless some adequate set of glyph IDs can be
assumed).

>Wouldn't it be considerably simpler to just add a dotless j to the Unicode
standard so that font designers become motivated to include it in the fonts?

That's not the motivation font designers need. Font designers have to, and do
(perhaps implicitly), design with one or more specific writing systems in mind.
If they decide to design for a writing system that uses j with various
diacritics, they will include a dotless j glyph, and/or whatever is needed to
present the j with those diacritics.

I say that the font designer must have particular writing systems in mind,
meaning that they shouldn't attempt to create fonts for the general case. Aside
from the fact that the latter would create fonts that are unwieldingly large
(MS's Arial Unicode is something like 23MB, and it doesn't include extra
presentation forms), it may not be possible with current font technology:
TrueType fonts, at least, can have at most 64K glyphs. If one were to design a
font for all of Unicode, it would take more than that many glyphs, even if
planes 1 to 14 are ignored. So, we don't want to motivate designers to include
dotless j in general - if they add dotless j for you today, they'll be asked to
add i- and o-width overstriking accents for someone else tomorrow, and on and
on. If they don't aim for a particular target, they'll never hit anything.

[from a subsequent message also by Adam:]

>I doubt I will sound more convincing to them when I tell them they need to
parse the fonts directly and remove dots from certain letters just because it is
cast in stone that Unicode only deals with characters, never with glyphs...

>If we don't KISS, many programmers will refuse to embrace us.

I wholly agree. See my second comment above. Application programmers will need
to make some adjustments to deal with Unicode, but it can be kept to a minimum
if they are provided with appropriate enabling technologies. (Of course, some
programmers need to come to the rescue of their peers and deal with the latter.)

Peter



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT