Xterm now has UTF-8 support

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Wed Jun 09 1999 - 15:59:37 EDT


Good news:

Unicode/ISO 10646-1 (Level 1) support for Linux and Unix under X11 is
one important step further. The latest development revision of the xterm
version distributed by the XFree86 project can now handle 16-bit
ISO10646-1 fonts and can do screen output, keyboard input, as well as
cut&paste all in UTF-8.

Here is how you can try it out very quickly yourself:

Get the xterm source code from

  http://www.clark.net/pub/dickey/xterm/xterm.tar.gz

(that is patch version #106 or higher), untar it, and compile it with

  ./configure --enable-wide-chars ; make

Also get from

  http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz

a set of ISO10646-1 versions of the default xterm fonts. The recommended
completed font in there is 6x13.pcf.gz, but the larger 9x15.pcf.gz and
10x20.pcf.gz fonts are also already in a quite advanced stage of
development (>2000 characters) and can also be used. Install at least
one of these ISO10646-1 fonts as described in the README file.

Now start xterm with option -u8 and select an ISO10646-1 font, for
instance as in

 xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1

To see an example UTF-8 output, just display the demo files that
came with the fonts, e.g.

  cat utf-8-demo.txt

If you have any non-ASCII characters on your keyboard, you can create
UTF-8 files by simply typing them in. All keysym codes of X11 are
mapped onto the corresponding UTF-8 sequence by xterm.

If say you want to have the euro sign on AltGr-E, then just add the line

  keysym e = e NoSymbol EuroSign NoSymbol

to your ~/.Xmodmap file (assuming you have "xmodmap .Xmodmap" in one of
your login scripts). Greek and Cyrillic keyboards should also work
immediately.

In case you are unfamiliar with UTF-8: The ASCII compatible UTF-8
encoding of Unicode is defined in

  ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/ISO-10646-UTF-8.html
  ftp://ftp.funet.fi/mirrors/nic.nordu.net/rfc/rfc2279.txt

It is the way in which Unicode will be used on Unix systems and will
hopefully replace ASCII and ISO 8859 soon.

More info on using UTF-8 under Unix will shortly be on

  http://www.cl.cam.ac.uk/~mgk25/unicode.html

where I will also collect information on how to make applications UTF-8
aware.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT