Re: "Missing character" glyph

From: Mark Davis (mark.davis@jtcsv.com)
Date: Thu Aug 01 2002 - 13:50:57 EDT


> - they are not to be publicly transmitted

This means that they are *not* to be used in public web pages. They are
reserved for internal use. They are also not safe; programs may use them for
internal constructs, such as sentinal values or placeholders. If a program
assumes that they are not publicly transmitted and accepts text without
filtering, then it may have its data structures fouled up. Of course,
programmers should all practice defensive programming, and it is a security
hole if someone doesn't filter, but for safety all non-characters should
never be transmitted.

Mark
__________________________________
http://www.macchiato.com
► “Eppur si muove” ◄

----- Original Message -----
From: "Martin Kochanski" <unicode@cardbox.net>
To: <unicode@unicode.org>
Cc: "Asmus Freytag" <asmusf@ix.netcom.com>
Sent: Thursday, August 01, 2002 09:38
Subject: Re: "Missing character" glyph

> At 08:42 01/08/02 -0700, Doug Ewell wrote:
> >Martin Kochanski <unicode at cardbox dot net> wrote:
> >
> >> To look at it another way, virtually the only action that the Unicode
> >> Consortium needs to take to define UNRENDERED CHARACTER is to promise
> >> never to define a character at that code point.
> >
> >I think this is exactly what they have done by creating the
> >"noncharacters" from U+FDD0 through U+FDEF. These code points are
> >guaranteed never to be assigned to real characters.
>
> First - thank you very much for the suggestion! I was looking in the
printed book, where of course these things aren't mentioned (they arrived in
3.0.1).
>
> Google led me to the UTC 84 / L2 181 Minutes, where Motion 84-M6 says of
not-a-characters:
> - they are not to be publicly transmitted
> - they can be deleted without an impact of the interpretation of the text
> - they should be removed in normalization
> - their presence may also indicate corrupted text.
>
> which suggests that putting them on a web page ought not to be allowed and
might not work.
>
> But - given that it would require a positive effort from a programmer to
implement any of these restrictions, I think we're pretty safe in assuming
that none of it will be enforced in practice, and so using U+FDD0 in online
help or web pages ought to be safe. In any case, it wouldn't be catastrophic
if things changed one day.
>
> It also means that I don't have to write any proposals or make committee
meetings longer than they already are!
>
> [Unless Asmus Freytag, who proposed motion 84-M7 that made FDD0-FDEF
not-a-characters has anything to add about the safety of using one of these
characters in this way?]
>
>
>



This archive was generated by hypermail 2.1.2 : Thu Aug 01 2002 - 11:55:18 EDT