UTF-8 support in the X Window System

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Mon Jun 21 1999 - 18:05:47 EDT

Are here any engineers from X.Org companies such as Sun, HP, SCO, IBM, etc.
who have a strong interest in seeing proper Unicode/UTF-8 support in the
X11 protocol specification and in the Xlib sample implementation?

I think it is time to set up a working group that takes care of filling
in the (very few but still crucial) missing bits in the X protocol and
Xlib to enable excellent interoperability between the many upcoming X11
clients with Unicode support.

There are a number of urgent issues that have to be arranged with the
X.Org group, and I am rather clueless about how to orderly proceed with
this. For example, as part of the xterm UTF-8 extension, Julius
Chroboczek and I have defined a few conventions that really should in
some form find their way into the next revision of the X protocol

  - UTF-8 selections have the new UTF8_STRING type (because STRING is
    per definition only Latin-1 in the existing standard). This allows
    selection interoperability between Unicode and Latin-1 applications.
  - Any Unicode character in the range U+000000 to U+FFFFFF has now
    a keysym code assigned (at least for xterm) that is obtained by
    adding 0x01000000, such that you can associate any Unicode character
    with a key and not just the tiny subset for which keysyms exist.

There is currently no support for ISO 10646-1/UTF-8 whatsoever in the
X11R6.4 sample implementation (there are only a few now obsolete
sketches of UTF-1 support). It will be necessary to implement UTF-8 as
one of the supported multi-byte encodings, such that xterm can get UTF-8
strings directly with XmbLookupString(), which would allow us to remove
the current keysym->UTF-8 hack in xterm again and put proper Unicode
keyboard support into Xlib (with hex entry, full support of the compose
key, etc.)

I think it is rather important that an X.Org working group is set up to
properly include support for UTF-8 into all the specifications and the
sample implementation. Doing so seems to be rather straight forward, but
it has to be done properly to enable interoperable use of ISO10646-1
under X.

How do we get this going?


Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT