L2/10-081 Title: Property Values for U+FFFC Source: Ken Whistler Date: March 1, 2010 Action: For consideration by the UTC The email discussion appended below between Mark Davis and Asmus Freytag (dating from late 2008) raises an issue regarding U+FFFC OBJECT REPLACEMENT CHARACTER, which was not resolved for Unicode 5.2. I suggest that the UTC take this up and come to a resolution for Unicode 6.0. Basically the issue is this: Should U+FFFC be treated as Default_Ignorable_Code_Point or not. Currently it is not, but the discussion by Asmus below suggests that it should be. (Remember, by Default_Ignorable_Code_Point, we currently mean that if an application does not otherwise support rendering of the code point, it should display as nothing, rather than as a black box missing glyph blort.) Any decision by the UTC should consider the following additional points: 1. Any information we have about legacy behavior of existing implementations and fonts for U+FFFC, and whether it would be advisable to make changes that impact those. 2. Any original intent for display that might be contrary to Asmus' summary below. 3. Property consistency issues. In particular, U+FFFC is currently (and always has been) gc=So. Currently no Default_Ignorable_Code_Point character is gc=So, so deciding to make U+FFFC default ignorable would either introduce a new class into the derivation of Default_Ignorable_Code_Point or would necessitate change of U+FFFC to some other General_Category value, which in turn would require checking the implications for other property consistency relations. 4. Note also that noncharacters (mentioned by Asmus as what would probably have been used for the objection replacement function, had we had them before U+FFFC was defined) are *not* Default_Ignorable_Code_Point. They used to be (in Unicode 5.0 and earlier), but that was deliberately changed for Unicode 5.1. ====================== quoted email ================================ > > On 12/23/2008 4:23 PM, Mark Davis wrote: >> > > >> > > >> > > 5. The following is a bit odd. The book rendering (in a dotted box) >> > > makes it look like a default ignorable. If it isn't a default >> > > ignorable, what should the visible rendering look like? >> > > >> > > OBJECT REPLACEMENT CHARACTER >> > > > > > > This character got encoded as *invisible* anchor character for inline > > objects. It's called "replacement" only since that was the easiest dodge > > around the name police at the time, and FFFD was adjacent and called a > > REPLACEMENT CHARACTER. Therefore, if U+FFFC isn't default ignorable, it > > probably should be, or treated the nearest thing to default ignorable. > > > > Most rich text systems drop inline graphics without hint when exporting > > plain text. U+FFFC was not intended (at the time) to change that, only > > to allow a unique code to exist in the (internal) text buffer on to > > which to hang character formatting in a regular way. > > > > As defined, the character *must not* get a default visible rendering. > > > > If implementations wanted to export it, the way that receivers dealt > > with such data was not specified in the standard. That was somewhat > > deliberate; such practice simply was not encouraged. > > > > Later, Unicode invented non-characters for similar needs - had the need > > for an OBJECT REPLACEMENT CHARACTER been raised after that point, it > > would have most likely been suggested to use a non-character, keeping it > > officially out of the recommended set of character codes for > > interchange. As it was, it was the first one where people realized that > > they needed a character code that was guaranteed not any other (real) > > character. > > > > A./ =======================================================================