Re: Characters that should be displayed?

From: Asmus Freytag <>
Date: Sun, 29 Jun 2014 12:24:05 -0700

On 6/29/2014 11:44 AM, Koji Ishii wrote:
>> Surrogate code points, private-use characters, and control characters are not given the Default_Ignorable_Code_Point property. To avoid security problems, such characters or code points, when not interpreted and not displayable by normal rendering, should be displayed in fallback rendering with a fallback glyph
> By looking at this, my questions are as follows:
> 1. Should control characters that browsers do not interpret be displayed in fallback rendering?
> 2. Should private-use characters (U+E000-F8FF, 0F0000-0FFFFD, 100000-10FFFD) without glyphs be displayed in fallback rendering?
> These two questions are probably yes from what I understand the text quoted above,

By displaying a fall-back rendering the user is alerted that something
is present, but normally not visible to the user.

However, these are not the only invisible characters, and many should
not (must not) be rendered, ever (except in diagnostic modes). So, it is
a bit unclear to me what precisely this recommendation buys you, as it
is incomplete.

The recommendation is prefixed with "To avoid security problems,...". If
this is taken to mean that it should apply in contexts that require
strict attention to security issues, then they probably define a minimum
of what should be done, and other measures need to be taken in addition.

> but things get harder the more I think:
> 3. When the above text says “surrogate code points”, does that mean everything outside BMP? It reads so to me, but I’m surprised that characters in BMP and outside BMP have such differences, so I’m doubting my English skill.

No, those would be supplementary code points. Surrogates are values that
are intended to be used in pairs as code units in UTF-16. Ill-formed
data may contain unpaired values, those are referred to as Surrogate
code points.

> 4. Should every code point that are not given the Default_Ignorable_Code_Point property and that without interpretations nor glyphs displayed in fallback rendering? I could not find such statement in Unicode spec, but there are some people who believe so.
> 5. Is there anything else Unicode recommends to display in fallback rendering, or not to display? This must be RTFM, but pointing out where to read would be appreciated.

Unicode mailing list
Received on Sun Jun 29 2014 - 14:24:56 CDT

This archive was generated by hypermail 2.2.0 : Sun Jun 29 2014 - 14:24:56 CDT