Re: Characters that should be displayed?

From: Konstantin Ritt <>
Date: Mon, 30 Jun 2014 18:59:54 +0300

2014-06-29 22:24 GMT+03:00 Asmus Freytag <>:

> but things get harder the more I think:
>> 3. When the above text says “surrogate code points”, does that mean
>> everything outside BMP? It reads so to me, but I’m surprised that
>> characters in BMP and outside BMP have such differences, so I’m doubting my
>> English skill.
> No, those would be supplementary code points. Surrogates are values that
> are intended to be used in pairs as code units in UTF-16. Ill-formed data
> may contain unpaired values, those are referred to as Surrogate code points.
IIRC, after HTML parsing, validating and building DOM, no any single
surrogate code point could be met in, since presence of any ill-formed data
in the Unicode text makes the whole text ill-formed.
It's a security recommendation to decoders to replace any
unpaired surrogate code point with U+FFFD instead, thus making the text
well-formed. As a side effect, the unpaired surrogate code point becomes
visible (usually as a square box fallback glyph).
What the consideration regarding U+FFFD in CSS?


Unicode mailing list
Received on Mon Jun 30 2014 - 11:01:38 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 30 2014 - 11:01:38 CDT