Re: Characters that should be displayed?

From: Philippe Verdy <>
Date: Mon, 30 Jun 2014 18:33:11 +0200

I generally agree with your comment.

For your question U+FFFD is not special in CSS, it's just a standard
character that will be mapped to some symbol (from any font, or synthetized
from an internal font (or collection of glyphs) of the renderer according
to other styles (there's no warranty that syles like itelaic or bold will
look different, in fact there's no good way to exhibit alternatives if the
renderer does not lookup a matching font, but at least the renderer should
size it according to the computed "font-size:" setting). That symbol is
often (but not necessaily a "white" question mark in a "black" diamond;
replace "white" in fact by background color/image/shades, and "black" by
the "color:" setting, just like in regular fonts mapping any other symbol).
This symbol should also have an inherited direction, not a strong LTR
direction: it should not alter the direction of text on either side (or
break runs of text) for Bidi rendering, but it may eventually be mirrored
in resolved RTL runs (if this is appropriate for the chosen glyph (not
always easy to determine if the symbol is chosen from a matching font in
context ; but as the symbol to use is quite arbitrary, and should be enough
distinctive from other characters, this mirroring is not really necessary,
unless the symbol shows some explicit text is a specific style; something
to avoid as the character is not specific to any script or language).

2014-06-30 17:59 GMT+02:00 Konstantin Ritt <>:

> 2014-06-29 22:24 GMT+03:00 Asmus Freytag <>:
>> but things get harder the more I think:
>>> 3. When the above text says “surrogate code points”, does that mean
>>> everything outside BMP? It reads so to me, but I’m surprised that
>>> characters in BMP and outside BMP have such differences, so I’m doubting my
>>> English skill.
>> No, those would be supplementary code points. Surrogates are values that
>> are intended to be used in pairs as code units in UTF-16. Ill-formed data
>> may contain unpaired values, those are referred to as Surrogate code points.
> IIRC, after HTML parsing, validating and building DOM, no any single
> surrogate code point could be met in, since presence of any ill-formed data
> in the Unicode text makes the whole text ill-formed.
> It's a security recommendation to decoders to replace any
> unpaired surrogate code point with U+FFFD instead, thus making the text
> well-formed. As a side effect, the unpaired surrogate code point becomes
> visible (usually as a square box fallback glyph).
> What the consideration regarding U+FFFD in CSS?
> Konstantin
> _______________________________________________
> Unicode mailing list

Unicode mailing list
Received on Mon Jun 30 2014 - 11:34:32 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 30 2014 - 11:34:32 CDT