From: Jukka K. Korpela (firstname.lastname@example.org)
Date: Sat Nov 15 2008 - 16:55:36 CST
Karl Pentzlin wrote:
> I am just writing a mail to someone in Russia who suggests to encode a
> "barred o with macron" which is used in the Orok language.
I think it is best to explain realistically that characters with diacritic
marks will not be added to Unicode as separately encoded, i.e. as code
points, as a matter of policy. You can say this in different formulations
and tones, of course. There’s no point in getting into long arguments.
> Trying to explain to him that the encoding of such letters is not
> needed, as sequences like U+04E9 U+0304 are appropriate, I have
> created a little Internet page to prove this:
Well the page seems to prove just the opposite, as you say, so it’s not
useful for your purposes. The point is that even though U+04E9 U+0304 doesn’t
work universally, or even widely, it’s the only way
> I am horrified to see the result, using a computer with the newest
> version of Microsoft Vista and Internet Explorer (see attached
> Orok.png). Firefox does not perform better.
What happens is that your browser has Times New Roman as the default font,
which contains (in your system, as well as mine) U+04E9 but not U+0304.
Hence the latter is taken from some other font, such as Arial Unicode MS. It
is no surprise that a diacritic from one font does not play well with a base
letter from another font. And if your browser had e.g. Calibri as the
default font, you might see just a macron with no base character, as I did
when I first looked at your page.
When creating web pages with more or less special characters, you just need
to consider font issues. If you want to present U+04E9 U+0304, then you
should suggest, in your CSS style sheet, fonts that contains both.
Unavoidably at present, some users won’t have any of those fonts installed.
The world isn’t perfect quite yet. (In fact, I’m afraid Arial Unicode MS
would be about the only font that is nowhere near common and has both of
Even with Arial Unicode MS for both characters, the visual appearance is
barely tolerable (the macron isn’t horizontally centered on the center of
the base character) for U+04E9 U+0304 and completely unacceptable for U+04E8
U+0304 even on Microsoft Word 2007, since the macron crosses the base
character so that the diacritic cannot be seen – it looks like some dirt on
the base letter.
> Thus, sequences like U+04E9 U+0304 are NOT appropriate to fulfil the
> user's needs, as long as leading operating systems behave like this
> more than 10 years after Unicode has decided no longer to accept
> precomposed characters.
I don’t see how this has anything to do with operating systems.
It’s a matter of fonts and a matter of application programs and the
libraries they use.
Too bad if you really need those characters. But encoding new letters with
diacritics as code points wouldn’t help. Even if it were possible to add
them into Unicode, it would take many many years before they have been added
there and implemented widely in fonts that are available on people’s
computers. It is much more realistic to hope for (and maybe to fight for)
better implementation of the existing Unicode characters in fonts and
-- Yucca, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Sat Nov 15 2008 - 16:59:00 CST