Re: Why people still want to encode precomposed letters

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Nov 15 2008 - 16:55:36 CST

  • Next message: Andrew Cunningham: "Re: Why people still want to encode precomposed letters"

    Karl Pentzlin wrote:

    > I am just writing a mail to someone in Russia who suggests to encode a
    > "barred o with macron" which is used in the Orok language.

    I think it is best to explain realistically that characters with diacritic
    marks will not be added to Unicode as separately encoded, i.e. as code
    points, as a matter of policy. You can say this in different formulations
    and tones, of course. There’s no point in getting into long arguments.

    > Trying to explain to him that the encoding of such letters is not
    > needed, as sequences like U+04E9 U+0304 are appropriate, I have
    > created a little Internet page to prove this:
    > http://www.pentzlin.com/Orok.html

    Well the page seems to prove just the opposite, as you say, so it’s not
    useful for your purposes. The point is that even though U+04E9 U+0304 doesn’t
    work universally, or even widely, it’s the only way

    > I am horrified to see the result, using a computer with the newest
    > version of Microsoft Vista and Internet Explorer (see attached
    > Orok.png). Firefox does not perform better.

    What happens is that your browser has Times New Roman as the default font,
    which contains (in your system, as well as mine) U+04E9 but not U+0304.
    Hence the latter is taken from some other font, such as Arial Unicode MS. It
    is no surprise that a diacritic from one font does not play well with a base
    letter from another font. And if your browser had e.g. Calibri as the
    default font, you might see just a macron with no base character, as I did
    when I first looked at your page.

    When creating web pages with more or less special characters, you just need
    to consider font issues. If you want to present U+04E9 U+0304, then you
    should suggest, in your CSS style sheet, fonts that contains both.
    Unavoidably at present, some users won’t have any of those fonts installed.
    The world isn’t perfect quite yet. (In fact, I’m afraid Arial Unicode MS
    would be about the only font that is nowhere near common and has both of
    them.

    Even with Arial Unicode MS for both characters, the visual appearance is
    barely tolerable (the macron isn’t horizontally centered on the center of
    the base character) for U+04E9 U+0304 and completely unacceptable for U+04E8
    U+0304 even on Microsoft Word 2007, since the macron crosses the base
    character so that the diacritic cannot be seen – it looks like some dirt on
    the base letter.

    > Thus, sequences like U+04E9 U+0304 are NOT appropriate to fulfil the
    > user's needs, as long as leading operating systems behave like this
    > more than 10 years after Unicode has decided no longer to accept
    > precomposed characters.

    I don’t see how this has anything to do with operating systems.

    It’s a matter of fonts and a matter of application programs and the
    libraries they use.

    Too bad if you really need those characters. But encoding new letters with
    diacritics as code points wouldn’t help. Even if it were possible to add
    them into Unicode, it would take many many years before they have been added
    there and implemented widely in fonts that are available on people’s
    computers. It is much more realistic to hope for (and maybe to fight for)
    better implementation of the existing Unicode characters in fonts and
    rendering systems.

    -- 
    Yucca, http://www.cs.tut.fi/~jkorpela/ 
    


    This archive was generated by hypermail 2.1.5 : Sat Nov 15 2008 - 16:59:00 CST