Re: Characters that should be displayed? from Konstantin Ritt on 2014-06-30 (Unicode Mail List Archive)

From: Konstantin Ritt <ritt.ks_at_gmail.com>
Date: Mon, 30 Jun 2014 18:59:54 +0300

2014-06-29 22:24 GMT+03:00 Asmus Freytag <asmusf_at_ix.netcom.com>:

> but things get harder the more I think:
>>
>> 3. When the above text says “surrogate code points”, does that mean
>> everything outside BMP? It reads so to me, but I’m surprised that
>> characters in BMP and outside BMP have such differences, so I’m doubting my
>> English skill.
>>
>
> No, those would be supplementary code points. Surrogates are values that
> are intended to be used in pairs as code units in UTF-16. Ill-formed data
> may contain unpaired values, those are referred to as Surrogate code points.
>
>
IIRC, after HTML parsing, validating and building DOM, no any single
surrogate code point could be met in, since presence of any ill-formed data
in the Unicode text makes the whole text ill-formed.
It's a security recommendation to decoders to replace any
unpaired surrogate code point with U+FFFD instead, thus making the text
well-formed. As a side effect, the unpaired surrogate code point becomes
visible (usually as a square box fallback glyph).
What the consideration regarding U+FFFD in CSS?

Konstantin

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Mon Jun 30 2014 - 11:01:38 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 30 2014 - 11:01:38 CDT