From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Wed Jul 06 2005 - 15:45:48 CDT
On Wed, 6 Jul 2005, Leiter Phelix wrote:
>> I am seeking some advice on the use of a capital H with a bar under
>> (see lowercase character 1E96).
>>
>> The character seems it ought to be a valid one and is used in Hefa in
>> Israel - this is made more likely by the inclusion of the lowercase
>> character in the Unicode range (1E96).
As far as I know, the only documented usage for 1E96 is in some
transliteration systems for Semitic languages, such as transliteration of
Arabic according to ISO 233. Although Arabic does not make case
distinction, it is customary and normal to use mixed case in
transliterated words and texts, using e.g. a capital letter at the start
of a proper noun. Thus, I too find it strange that the corresponding
capital letter has no code position in Unicode and that 1E96 has no
uppercase mapping.
>> Could somebody please advise me:
>> 1) how to construct the character by using floating marks - as my
>> results do not provide as good a representation as the lowercase
>> version
As Clark Cox replied, U+0048 U+0331 is the Unicode representation of
capital H with bar under, or actually with line below, to use the Unicode
name*). You can write the character in Unicode even though it has no code
position of its own, and the standard _could_ specify U+0048 U+0331 as
the uppercase mapping of U+1E96. I don't understand why it doesn't.
*) The naming is somewhat odd, because the combining diacritic U+0331
is named "combining macron below".
Anyway, when you use U+0048 U+0331, you are asking programs to construct a
rendering by adding a line under to "H", whereas for E+1E96, programs may
use a glyph from a suitable font. So the rendering mechanisms can be
rather different. In current software, the dynamic construction of
characters with diacritic marks is usually qualitatively poor and does not
really correspond to the Unicode standard's idea of such construction.
You may well get different renderings by using U+1E96 and U+0068 U+0331,
even though they are canonically equivalent. Programs typically render
U+0068 U+0331 using their rather primitive method for dynamic
construction, instead of recognizing the sequence as identical to U+1E96
and using its glyph instead.
I'm afraid there is not much you can do, except perhaps try another
program and/or another font, if possible. (In a simple test, I noticed
that using Arial Unicode MS, U+0048 U+0331 looks rather bad - the line
under is positioned too much on the left, but in Times New Roman, it looks
acceptable to me. Your mileage most probably varies.)
>> 2) why this character is not in the Unicode range or whether it is/has
>> been considered for inclusion
As I wrote, I don't know this piece of history. But given the fact that it
has no code position now, it is very probable that it will not be added.
The general policy is to avoid adding new precomposed characters; we are
supposed to use combining diacritic marks instead.
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Wed Jul 06 2005 - 15:46:34 CDT