RE: Xterm and UCS combining characters

From: Jonathan Rosenne (rosenne@qsm.co.il)
Date: Fri Jun 18 1999 - 08:51:59 EDT

Next message: Markus Kuhn: "Many new X11 ISO10646-1 BDF fonts available"
Previous message: Magda Danish (Unicode): "FW: EBCDIC and Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

The attached misses the Hebrew combining marks (0591 - 05c4), the points
(05b0-05c4) and accents (0591-05af). However, they may be safely ignored, as
the Israeli standard says that if you cannot display them then don't, just
don't lose them when passing the text on.

Jony

> -----Original Message-----
> From: Markus Kuhn [mailto:Markus.Kuhn@cl.cam.ac.uk]
> Sent: Thursday, June 17, 1999 4:27 PM
> To: Unicode List
> Cc: fonts@xfree86.org; unicode@unicode.org
> Subject: Re: Xterm and UCS combining characters
>
>
> Theppitak Karoonboonyanan wrote on 1999-06-14 04:12 UTC:
> > Markus Kuhn wrote:
> > > Unicode/ISO 10646-1 (Level 1) support for Linux and Unix under X11 is
> > > one important step further. The latest development revision
> of the xterm
> > > version distributed by the XFree86 project can now handle 16-bit
> > > ISO10646-1 fonts and can do screen output, keyboard input, as well as
> > > cut&paste all in UTF-8.
> <http://www.clark.net/pub/dickey/xterm/xterm.tar.gz>
> >
> > Great! I've tried it and really wish Thai could be shown there.
> >
> > I have tried adding Thai glyphs to my local copy of your UCS font, just
> > for the experiment, and have found that it displays fairly well.
> >
> > However, it's not so easy as I first thought. I solve the problem with
> > Thai combining characters with negative value of x in the origin, and
> > prevent it from occupying horizontal space by setting DWIDTH
> and SWIDTH to
> > zero.
> >
> > And it works as I expect, the characters are combined. However, they
> > are always followed by a space (which belongs to the combining
> character).
> > It seems to need some hacking in the xterm code itself to make it work
> > properly.
> >
> > What should we do? Is it possible to enable Thai?
>
> As you have seen, xterm does at the moment not yet have any support for
> combining characters. We are perhaps not sure exactly, how this should
> be done best, because the BDF font format clearly was not designed with
> combining characters in bind. Some combining characters can probably be
> generated by just OR-ing the bitmaps of the base glyph and the combining
> character together, and it would be relatively easy to extend xterm to
> this effect. However with many existing fonts this will not work,
> because diacritical marks will not fit over the glyph of the normal base
> characters, and the base characters in precomposed fonts are smaller for
> this purpose (this is what is done in most of the ISO 8859-1 fonts).
> Also, sometimes the position of the diacritical marks should vary based
> on the base characters, which is why for instance the TeX fonts contain
> auxiliary information for every glyph to aid in correct positioning of
> accents.
>
> My suggestion would be the following:
>
> We could modify xterm such that combining characters just overstrike the
> glyph that was displayed by the immediately preceding character and do
> not advance the cursor. There should be a command line option that (for
> debugging purposes) allows to enforce that combining characters are
> treated like normal characters (as they are at the moment).
>
> The following C function determines, whether a Unicode character is a
> non-spacing combining character that puts an accent on the preceding
> character instead of advancing the cursor:
>
> /*
> * This function tests, whether the ISO 10646/Unicode character code ucs
> * represents a combining character (return 1) or not (return 0).
> */
> int iscombining(int ucs)
> {
> return
> (ucs >= 0x0300 && ucs <= 0x036f) || /* combining diacritical marks */
> (ucs >= 0x0483 && ucs <= 0x0489) || /* combining marks for
> Cyrillic */
> (ucs >= 0x20d0 && ucs <= 0x20ff) || /* combining marks for symbols */
> (ucs >= 0x3099 && ucs <= 0x309a) || /* combining
> Katakana-Hiragana marks */
> (ucs >= 0xfe20 && ucs <= 0xfe2f); /* combining half marks */
> }
>
> We then also have to make fonts in which there is so much white space
> above and below even the capital letters such that is is not necessary
> to shrink a character to put an accent over it. The fonts that we have
> done so far were not designed to allow overstriking with combining
> characters, but we can make new fonts that will support this, at least
> to some degree. Question: How do we indicate in the XLFD that this font
> was designed for overstrining combining characters?
>
> Suggested new command line options:
>
> -co disable overstriking by combining characters and treat them
> like normal characters (as xterm does at the moment)
> +co ISO 10646 combining characters shall overstrike the immediately
> previously displayed character. Note that this will only look
> good with some fonts that were designed for this.
>
> Not only Thai users, but also mathematicians who like to put vector
> arrows etc. over almost everything would certainly appreciate at least
> some level of support for combining characters.
>
> Note that this is not at all a trivial extension. Xterm stores the
> entire screen matrix to allow redraws, and with combining characters,
> more then one single character can be associated with one single
> character cell of the display matrix. An auxiliary data structure is
> necessary to allow at least one additional combining character to be
> associated with each character cell. In your DWIDTH/SWIDTH hack, you
> used the following cell for this purpose, which is why you saw a space
> there.
>
> I don't think there is any existing practice on how to handle combining
> characters in VT100 emulators, at least all I have seen so far
> implemented only ISO 10646-1 level 1. Looks like an interesting thing to
> experiment with.
>
> Markus
>
> --
> Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
> Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
>

Next message: Markus Kuhn: "Many new X11 ISO10646-1 BDF fonts available"
Previous message: Magda Danish (Unicode): "FW: EBCDIC and Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT