Re: C1 Control Pictures Proposal

From: Sean Leonard <>
Date: Sun, 21 Aug 2011 09:44:54 -0700

Hi Ken et. al.,

On Aug 17, 2011, at 2:49 PM, Ken Whistler wrote:

> Further comments:
> On 8/13/2011 10:48 AM, Sean Leonard wrote:
>> In accordance with this and other text in the Standard, it is not really possible to assign glyphs uniformly and interchangeably to the code points in U+0000-U+001F and U+0080-U+009F.
> Of course it is. The Unicode Standard has done so for years: they are called code chart
> display glyphs. What one cannot expect is that plain text renderers will display control
> characters as visible glyphs in a uniform fashion -- they aren't supposed to, because
> the control codes aren't graphic characters. That is,rather, what "show hidden" modes
> are all about, and there really aren't any constraints on the details of exactly how
> a show hidden implementation may choose to display the undisplayable, as it were.

Can you please explain where in the Unicode standard you are referring to? Is there a "show hidden" mode or code point sequence in the Unicode Standard? If you are referring to "code chart display glyphs" meaning the glyphs in the literal document for U+0080, that is beside the point. If you are referring to a "show hidden code points" mode in an editor (such as a terminal emulator, Emacs, Notepad++, or another editor), I understand what you are getting at, but that is exactly what is unhelpful. As you point out, "there really aren't any constraints on the details of exactly how
a show hidden implementation may choose to display the undisplayable"--and that is exactly the problem. One advantage of my proposal is that fonts that provide glyphs for these code points can have glyphs that are visually similar (e.g., in monospace dimensions yet remain readable) between that code point and other graphic characters. For those who say "oh, just have an editor show [HOP] or whatever", that is exactly the problem: the editor cannot show [HOP] in a uniform way along with the rest of the glyphs that represent U+0000 - U+007F and U+00A0-U+00FF [modulo U+00A0 and U+00AD]. How ironic is it that fonts can encode the characters U+0000-U+001F (and space and delete) uniformly for display, yet can do no such similar thing for the other half of these characters?

This is definitely not a confusion between glyphs and characters. This is about having character code points for a uniform representation of these characters as-displayed in interchange, so that two systems (e.g., an application and the graphics rendering subsystem of the operating system, or the graphics rendering subsystem of an operating system and the font software that the OS uses) can interchange data unambiguously.

The Unicode Standard does not dictate the precise glyphs; it only shows representative glyphs. A font designer could choose among alternative glyphs for the graphic character code point. For example, for U+001B -> U+241B ESCAPE, the font designer could choose ESC (scrunched horizontally), ESC (diagonally), ^[ (scrunched horizontally--^[ is a common legacy rendering of ESC) or ESC with a box around it. But because the user has chosen that particular font in that particular editor or rendering session, the user would be guaranteed that ESC -> ^[ (scrunched) would be visually similar to ^\ (file separator, scrunched), which would be visually similar to the C1s and to the graphic characters. No such guarantee can currently be made without C1 Control Pictures.

>> Variation selectors (sec. 16.4), for example, "provide a mechanism for specifying a restriction on the set of glyphs that are used to represent a particular character [examples given of CJK ideographs and Mongolian letters]." Variation selectors and other Unicode-defined control code points are ill-suited to causing C1 values to be displayed, because C1 values have no "display representation" in and of themselves.
> That whole discussion of variation selectors is beside the point. Variation sequences can
> only be defined for *base* characters. Base characters are a subset of graphic
> characters (see D51 in Chapter 3 of the Unicode Standard). Control characters
> aren't graphic characters. Hence they are not base characters, either, and could
> never be used in variation sequences, anyway.

Correct. As per above, C1 control characters lack graphical variations. Let's give them graphics. To display is to know.


> --Ken
Received on Sun Aug 21 2011 - 11:50:11 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 21 2011 - 11:50:28 CDT