Re: problem with combining diacritcs in HTML5

From: Jukka K. Korpela <jkorpela_at_cs.tut.fi>
Date: Sun, 07 Oct 2012 10:37:05 +0300

2012-10-07 8:38, Bill Poser wrote:

> I have a web page that writes into an HTML5 textarea via the javascript
> dom interface. U+0332 COMBINING LOW LINE is incorrectly rendered as a
> spacing low line in both Mozilla Firefox and Google Chrome

The issue is not limited to textareas but appears in normal text too,
when the font is set to Courier New. You can also see the problem in
Microsoft Word, for example, when using that font. The point is that
this is a font problem, and you can see it in textareas because they
typically have Courier New as the default font.

Inspecting the Courier New font, version 5.11, I noticed that the
advance width of the glyph for U+0332 (glyph uni0331) is 1129 units. I
think this explains it all. The advance width should be 0.

Courier New has the same issue with other combining marks, too.

And other fonts have the same problem, at least the following: Courier,
DejaVu Sans Mono, Fixedsys, Meiryo, Meiryo UI, Modern, Sun-ExtA,
Terminal, VL PGothic.

Presumably, many designers of monospace fonts have just failed to set
the advance width of combining marks to zero. After all, being monospace
means having the same advance width for all characters – but this should
be understood in terms of “intuitive characters”, treating e.g. a base
character and a combining mark as one character.

Not all monospace fonts have this problem. If you wish to use a
monospace font in a textarea (this is really no good reason to this in
most cases!) you declare, in CSS, e.g.

textarea { font-family: Consolas, FreeMono, Everson Mono; }

This, however, would help only if the user’s browser has some of those
fonts.

I do not see the problem in Mozilla Firefox (version 15.0.1, which is
the newest released version; tested on Windows 7). I suspect the reason
is that this browser, at least in its newest version, is intelligent
enough to know that U+0332 is defined to be a combining mark, so it
overrides the advance width specified in the font.

> Characters with a
> combining low line encoded as a single Unicode codepoint are rendered
> correctly.

That’s because they are rendered using a single glyph taken from some font.

> Incomplete Unicode support
> in the HTML5 spec?

HTML specifications do not require Unicode support. The same applies to
working drafts called “HTML5 spec”. They all define the character
concept as referring to Unicode characters, but they do not impose are
requirement on conformance to the Unicode standard. See “Unicode
conformance model” http://unicode.org/reports/tr33/ for a description of
conformance requirements.

Yucca
Received on Sun Oct 07 2012 - 02:41:42 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 07 2012 - 02:41:44 CDT