From: Ed Trager (firstname.lastname@example.org)
Date: Fri Feb 08 2008 - 14:02:11 CST
Just a few brief comments on this thread:
> Having flown halfway around the world to talk to people who for whatever
> reasons, both valid and invalid (and not really distinguishing which is
> which on their list of concerns), are unhappy with a language encoding that
> in their view doubles or worse the amount of bytes used to store their
> language in Unicode, I can tell you that this as very real concern on some
> people's minds.
> True or false, it is on their minds. They can all add and multiply, and it
> certainly looks like a 2x or 3x situation to them.
Of course it is on their minds! Judging from the titles of emails in
my spam box, size really does matter. But apparently what humanity
really wants to do is MAXIMIZE the size, not minimize it. So a 2x or
3x situation should be good. :-)
On Feb 8, 2008 5:52 AM, Sinnathurai Srivas <email@example.com> wrote:
> My question was, mostly all proper publishing softwares do not yet support
> complex rendering. How many years since Unicode come into being?
> When is this going to be resolved, or do we plan on choosing an alternative
> encoding as Unicode is not working.
Unicode does in fact work very well. Implementing good Unicode
support for complex text layout (CTL) scripts like Tamil is
achievable. Not sure what "proper publishing software" includes --
For example, would that include http://ta.wikipedia.org/ ?
From an economic perspective, when the markets in South and Southeast
Asia that require complex text layout look enticing enough to the
software vendors, then the problem will be solved. Is it possible
that rampant piracy of commercial software throughout Asia actually
contributes to the problem of poor support for many Asian scripts in
heavy-weight commercial software like Adobe InDesign? This question
might be a great topic of some student's research paper.
Clearly the commercial players like Adobe InDesign and Quark XPress
and the non-commercial players like Scribus (http://www.scribus.net/)
are all working on providing support for CTL scripts. In this arena,
the Open Source players are influenced by a different set of driving
criteria than the commercial vendors: Does being Open Source encourage
faster development of non-Latin script support? This question might
be a great topic for some other student's research paper.
In any case, the transparency of development in the Open Source world
allows one to find out exactly how things stand. For example, here is
the link to Scribus' "Support for Non-Latin Languages" meta-bug page:
And in the case of Scribus, for example, one is welcome to contribute
well-documented test cases (sample Unicode text along with references
to fonts that are know to work correctly in other software) which the
developers can use for testing the software.
> As for bitmap, I meant the "Rigidly-fixed-width-character" requirements.
> At present, the complex rendering (which is not working yet in these
> systems) will produce extremly large width glyphs which will not be
> accomodated by "rigidly-fixedwidth- requirements. What is the plan to
> resolve this?
The only place where "rigidly fixed width" characters are normally required
that I can think of is in terminal emulators. Once upon a time I
investigated the idea of creating a terminal emulator --along with a
bitmap font-- that would support scripts like Myanmar (Burmese),
Tamil, etc. (Actually, from time to time, I still return to this
In existing terminal emulators, Latin glyphs take up one character
cell each, while CJK glyphs are "double-width" and take up 2 character
cells each. The GNU Unifont BMP bitmap font originally designed by
Roman Czyborra (http://en.wikipedia.org/wiki/GNU_Unifont) provides a
good example of how this works: most of the glyphs are 8 pixels wide
by 16 pixels high, but the CJK glyphs are 16 pixels wide by 16 pixels
In the hypothetical system as I had envisioned it, glyphs other than
CJK glyphs could also be double-width. And, in fact, why limit
ourselves to widths of 1 and 2 character cells? When I was
investigating Myanmar, I thought that it actually would be *better* to
allow some glyphs to stretch across 3 or even 4 character cells.
We can think of this hypothetical terminal emulator as having a
cartesian grid and glyphs of all scripts need to fit into discrete
"quantum" cells : 1, 2, 3, or 4. (Maybe one could even make an
argument for some glyph using up 5 quantum cells?)
An experienced font designer (or team of designers) would then take up
the challenge of creating a font to use with this terminal emulator.
The font need not be a bitmap font -- it could just as easily be a
vector font. For the sake of argument, let's say we allow this
hypothetical terminal to use vector fonts (i.e., we could just make a
special kind of OpenType font which could even have embedded bitmaps
So for the various Latin blocks of Unicode we could start out with a
suitable "monospaced" font. In a Latin monospaced font, all letters
fit into fixed-width cells so that the advance distances on all glyphs
are the same. This obviously requires some special aesthetic
compromises, especially on the wide Latin letters like "m" and "w".
To this originally "monospaced" font, we would now add additional
blocks of Unicode. We could pretty much continue working within our
"monospaced" design mantra through many blocks of Unicode -- until, of
course, we hit scripts like Devanagari, Tamil, Myanmar, Khmer, and so
on. Arabic too. At this point, our originally "monospaced" font
becomes no longer "monospaced". Let's give it a new name -- how about
"quantized font" or "quantum spaced font"? Or simply "quantum font" ?
In this new quantum font, whenever an individual glyph became too
horribly "squished" to fit inside one quantum character cell, then we
would automatically try a 2-cell approach, and if even that did not
work, then go for a 3-or 4-cell approach.
As a quick and familiar example, let's use Arabic script. On Linux,
the mlterm folks (http://mlterm.sourceforge.net/) have actually
produced a "multilingual" terminal that even handles RTL Arabic. This
is pretty cool. Mlterm uses GNU unifont for its Arabic glyphs.
Arabic in mlterm is readable, which is nice, but it is really ugly.
For example, terminal ARABIC LETTER SHEEN ش looks almost unbearably
*squished*. Clearly, wide arabic letters like isolated or terminal
ARABIC LETTER SHEEN ش or ARABIC LETTER SAAD ص would probably end up
looking *much* nicer if we just allowed them to occupy 2 character
cells. So, in this quantum font, most Arabic letters would still
occupy just one character cell, but a few would occupy up to 2
A similar principle would apply for the creation of the necessary
glyphs for scripts like Myanmar and Tamil -- except in these cases
there would be some glyphs that would necessarily take up 3 or even 4
Well that's my idea, for what it is worth. I even tried my hand at
creating a set of bitmap glyphs for Myanmar which could be added to
GNU Unifont. But after wasting a lot of time on this, I realized I
did not know how to write a terminal emulator. So, maybe someday I
will return to this outlandish project. After I have learned how to
write a terminal emulator.
- Ed Trager
This archive was generated by hypermail 2.1.5 : Fri Feb 08 2008 - 14:05:30 CST