RE: Missing capital H from Unicode range (see 1E96)

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Jul 07 2005 - 00:38:54 CDT

  • Next message: John Hudson: "Re: Arabic encoding model (alas, static!)"

    On Wed, 6 Jul 2005, Peter Constable wrote:

    >> Anyway, when you use U+0048 U+0331, you are asking programs to construct a
    >> rendering by adding a line under to "H", whereas for E+1E96, programs
    >> may use a glyph from a suitable font.
    >
    > I don't know about anybody else's software, but in MS software both
    > would be assumed to be handled by using appropriate glyphs from a
    > suitable font.

    I'm afraid there's some confusion here. Although a font might conceivably
    contain a glyph for a character that does have a code position in Unicode,
    I don't think that's what we can normally expect; the number of characters
    that can be represented using combining diacritic marks is _huge_.

    Is there a font that contains a glyph for "H" with line under?

    > Well, I'm not sure what software or version of Arial Unicode MS you're
    > using. In current MS software, it comes out perfectly fine:

    The image you included looks like _lowercase_ h with line under to me.

    What I get for U+0048 U+0331 using Arial Unicode MS version 1.01 on MS
    Word 2002 under Windows XP has the line under on the right (sorry for
    writing "left" in my original description), not nicely centered under the
    "H". My experiences with MS Word, WordPad, and Internet Explorer for
    different combinations of a base character and a combining diacritic mark
    have never yielded anything but a "mechanical" composition, as if the base
    character were printed, printing position backspaced, and a fixed
    diacritic (in a shape and position that does not depend on the base
    character) overprinted. Maybe I just didn't try hard enough, or cleverly
    enough.

    Such simple construction, which corresponds to overprinting that was used
    long ago on typewriters and printers, often produces a tolerable result,
    or at least an understandable result. But it may completely fail too, of
    course, depending on the combination.

    I created a trivial demo document for testing how Web browsers deal with
    this:
    http://www.cs.tut.fi/~jkorpela/test/h.html
    It contains just H̱ in large font size, so that it can be tested
    using different fonts just by changing the browser's default font.
    Internet Explorer 6 usually shows just "H" followed by a rectangle...

    >> As I wrote, I don't know this piece of history. But given the fact
    >> that it has no code position now, it is very probable that it will not be
    >> added.
    >
    > Even stronger: it is certain that it will not be added.

    I know it's a decided policy; I used the words "very probable", because
    permanent decisions are sometimes changed later.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Thu Jul 07 2005 - 00:40:05 CDT