This is slightly off-topic, but there are probably plenty of people here
in the know...
Background: I am writing a font rendering engine. I am trying write it
to be as general as possible. (The initial purpose is to provide
language learning aids, done as Java applets, to English speakers
learning foreign languages including, probably, Russian, Armenian,
Hebrew, Arabic, Hindi [using Devanagari] and Japanese (vertically
rendered) - others later. I am trying to avoid the 'download this
font/software' syndrome, and so I am writing the software to build text
by cutting symbols from a '.gif' image, and drawing them correctly
My PROBLEM is this: I am not a linguist, nor a language encoding expert,
and I can't find
any sites or other resources on glyph composing in the various scripts.
I've managed to piece together the apparent rules of some of the
scripts, mostly from staring at phrasebooks and dictionaries. Some of
the types of questions I'm trying to find answers to are:
a) Hebrew vowels (when used) are rendered below their preceding
consonant (except a few i.e. aleph). But if a character string being
rendered contains two (or more) of these vowel characters consecutively
what would be reasonable (standard?) rendering behaviour? Over-typing
the second and subsequent vowel on the first? Ignoring extra vowels?
Rendering the second consecutive vowel beneath the first, which is
itself under the consonant? Rendering the second and subsequent vowels
under empty spaces (representing the 'missing' consonants)?
b) Basically, the same problem but for Arabic. Also, in Arabic it seems
there are several character rendering variants that, I guess, would be
called ligatures. How and where are these applied? Is there anywhere I
can find a list of the common ones used in standard published material
and their glyph-shapes? Would these perhaps be represented in Unicode
as 'presentation types'?
c) I also need to find a list of consonantal composed characters for
d) Does unicode include any standard way of representing Furigana
characters (presumably in-lined with the main body of a Japanese text)?
(Please correct me, if any of my statements above are wrong.)
Basically, I guess what I'm trying to find is formal definitions of the
conversion of character strings to spatial layouts of glyphs.
Any pointers to web-sites, books, standards etc would be greatly
Paul J. Lewis
"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning."
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT