Re: UTF-8 support in the X Window System

From: Juliusz Chroboczek (jec@dcs.ed.ac.uk)
Date: Wed Jun 23 1999 - 21:01:16 EDT

Next message: Kenneth Whistler: "Unicode 3.0 data files (beta) available!"
Previous message: Ienup Sung: "Re: UTF-8 and POSIX"
Maybe in reply to: Markus Kuhn: "UTF-8 support in the X Window System"
Next in thread: Steve Swales: "Re: UTF-8 support in the X Window System"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hideki Hiura <hiura@bakabon.eng.sun.com>:

HH> As one of architect of x-xi18n working group,I am curious which
HH> part of X11 protocol do you think we need an update [for Unicode
HH> support].

I don't want to speak for Markus, but I don't think he meant the
protocol in particular; he was thinking of X11-related standards in
general. However, here's a few thoughts about the protocol itself.

First and foremost, there is, to my knowledge, no reliable way for a
client to determine the exact set of glyphs that a font has (remember
that the server optimises away metric information for charcell fonts).
The assumption here is that if a font claims to be encoded according
to, say, ISO 8859-1, then it contains glyphs for all of ISO 8859-1,
and none more. This is obviously not true of Unicode-encoded fonts.

Another problem is that the only way to determine font and glyph
metrics is through QueryFont, which (in the non-charcell case) returns
metrics for all the font indices between the smallest and the largest
in the font, at the rate of 12 bytes per glyph (if memory serves).
This means that if you have a Unicode-encoded font which only covers
the ASCII repertoire and a few additional codepoints in the high
codepoints (say, the common fi and fl ligatures), QueryFont leads to a
transfer of about 700K of metric information.

Finally, I have no idea how X11 fonts can provide more than one glyph
per codepoint. More flexible font encoding schemes are needed for
ligatures and combining marks, as well as language-specific variations
(say, different glyphs for the Unihan range, or distinguishing between
Spanish `o with acute' and Polish `o kreskowane'; no, I won't mention
umlauts and diereses). However, this problem may not be too difficult
to solve, either with an extension that provides OpenType-like
capabilities, or by using something different than raw Unicode for the
font encoding (what?).

HH> due to the X Consortium's merger with OpenGroup, such
HH> implementations had never been returned to vanilla X11R6.x.

Could you please explain this? (Eventually by private mail.)

MK> - UTF-8 selections have the new UTF8_STRING type

(It's both a selection target and a property type, to be pedantic.)

HH> FYI: What X-i18n WG planned to support Unicode string right before X
HH> Consortium was tapering out is to define CompoundText2, which directly
HH> transfer UTF16.

I guess it doesn't much matter whether UTF-8 or UTF-16 is used, as
long as all clients use the same encoding. The main reason I proposed
UTF-8 is that all the clients I've had the occasion to play with use
UTF-8, albeit in subtly incompatible ways.

CompoundText2 -- what a horrible name!

Sincerely,

Next message: Kenneth Whistler: "Unicode 3.0 data files (beta) available!"
Previous message: Ienup Sung: "Re: UTF-8 and POSIX"
Maybe in reply to: Markus Kuhn: "UTF-8 support in the X Window System"
Next in thread: Steve Swales: "Re: UTF-8 support in the X Window System"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT