Re: Xterm and UCS combining characters

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Thu Jun 17 1999 - 10:30:37 EDT


Theppitak Karoonboonyanan wrote on 1999-06-14 04:12 UTC:
> Markus Kuhn wrote:
> > Unicode/ISO 10646-1 (Level 1) support for Linux and Unix under X11 is
> > one important step further. The latest development revision of the xterm
> > version distributed by the XFree86 project can now handle 16-bit
> > ISO10646-1 fonts and can do screen output, keyboard input, as well as
> > cut&paste all in UTF-8. <http://www.clark.net/pub/dickey/xterm/xterm.tar.gz>
>
> Great! I've tried it and really wish Thai could be shown there.
>
> I have tried adding Thai glyphs to my local copy of your UCS font, just
> for the experiment, and have found that it displays fairly well.
>
> However, it's not so easy as I first thought. I solve the problem with
> Thai combining characters with negative value of x in the origin, and
> prevent it from occupying horizontal space by setting DWIDTH and SWIDTH to
> zero.
>
> And it works as I expect, the characters are combined. However, they
> are always followed by a space (which belongs to the combining character).
> It seems to need some hacking in the xterm code itself to make it work
> properly.
>
> What should we do? Is it possible to enable Thai?

As you have seen, xterm does at the moment not yet have any support for
combining characters. We are perhaps not sure exactly, how this should
be done best, because the BDF font format clearly was not designed with
combining characters in bind. Some combining characters can probably be
generated by just OR-ing the bitmaps of the base glyph and the combining
character together, and it would be relatively easy to extend xterm to
this effect. However with many existing fonts this will not work,
because diacritical marks will not fit over the glyph of the normal base
characters, and the base characters in precomposed fonts are smaller for
this purpose (this is what is done in most of the ISO 8859-1 fonts).
Also, sometimes the position of the diacritical marks should vary based
on the base characters, which is why for instance the TeX fonts contain
auxiliary information for every glyph to aid in correct positioning of
accents.

My suggestion would be the following:

We could modify xterm such that combining characters just overstrike the
glyph that was displayed by the immediately preceding character and do
not advance the cursor. There should be a command line option that (for
debugging purposes) allows to enforce that combining characters are
treated like normal characters (as they are at the moment).

The following C function determines, whether a Unicode character is a
non-spacing combining character that puts an accent on the preceding
character instead of advancing the cursor:

/*
 * This function tests, whether the ISO 10646/Unicode character code ucs
 * represents a combining character (return 1) or not (return 0).
 */
int iscombining(int ucs)
{
  return
    (ucs >= 0x0300 && ucs <= 0x036f) || /* combining diacritical marks */
    (ucs >= 0x0483 && ucs <= 0x0489) || /* combining marks for Cyrillic */
    (ucs >= 0x20d0 && ucs <= 0x20ff) || /* combining marks for symbols */
    (ucs >= 0x3099 && ucs <= 0x309a) || /* combining Katakana-Hiragana marks */
    (ucs >= 0xfe20 && ucs <= 0xfe2f); /* combining half marks */
}

We then also have to make fonts in which there is so much white space
above and below even the capital letters such that is is not necessary
to shrink a character to put an accent over it. The fonts that we have
done so far were not designed to allow overstriking with combining
characters, but we can make new fonts that will support this, at least
to some degree. Question: How do we indicate in the XLFD that this font
was designed for overstrining combining characters?

Suggested new command line options:

  -co disable overstriking by combining characters and treat them
        like normal characters (as xterm does at the moment)
  +co ISO 10646 combining characters shall overstrike the immediately
        previously displayed character. Note that this will only look
        good with some fonts that were designed for this.

Not only Thai users, but also mathematicians who like to put vector
arrows etc. over almost everything would certainly appreciate at least
some level of support for combining characters.

Note that this is not at all a trivial extension. Xterm stores the
entire screen matrix to allow redraws, and with combining characters,
more then one single character can be associated with one single
character cell of the display matrix. An auxiliary data structure is
necessary to allow at least one additional combining character to be
associated with each character cell. In your DWIDTH/SWIDTH hack, you
used the following cell for this purpose, which is why you saw a space
there.

I don't think there is any existing practice on how to handle combining
characters in VT100 emulators, at least all I have seen so far
implemented only ISO 10646-1 level 1. Looks like an interesting thing to
experiment with.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT