L2/10-081


Title:  Property Values for U+FFFC

Source: Ken Whistler

Date:   March 1, 2010 

Action: For consideration by the UTC


The email discussion appended below between Mark Davis
and Asmus Freytag (dating from late 2008) raises an issue
regarding U+FFFC OBJECT REPLACEMENT CHARACTER, which was
not resolved for Unicode 5.2. I suggest that the UTC
take this up and come to a resolution for Unicode 6.0.

Basically the issue is this: Should U+FFFC be treated
as Default_Ignorable_Code_Point or not. Currently it is
not, but the discussion by Asmus below suggests that it
should be. (Remember, by Default_Ignorable_Code_Point,
we currently mean that if an application does not otherwise
support rendering of the code point, it should display
as nothing, rather than as a black box missing glyph blort.)

Any decision by the UTC should consider the following
additional points:

1. Any information we have about legacy behavior of existing
   implementations and fonts for U+FFFC, and whether it
   would be advisable to make changes that impact those.
   
2. Any original intent for display that might be contrary
   to Asmus' summary below.
   
3. Property consistency issues. In particular, U+FFFC is
   currently (and always has been) gc=So. Currently no
   Default_Ignorable_Code_Point character is gc=So, so
   deciding to make U+FFFC default ignorable would either
   introduce a new class into the derivation of
   Default_Ignorable_Code_Point or would necessitate change
   of U+FFFC to some other General_Category value, which in
   turn would require checking the implications for other
   property consistency relations.
   
4. Note also that noncharacters (mentioned by Asmus as what
   would probably have been used for the objection replacement
   function, had we had them before U+FFFC was defined) are
   *not* Default_Ignorable_Code_Point. They used to be (in
   Unicode 5.0 and earlier), but that was deliberately changed
   for Unicode 5.1.
   
====================== quoted email ================================

> > On 12/23/2008 4:23 PM, Mark Davis wrote:
>> > >
>> > >
>> > > 5. The following is a bit odd. The book rendering (in a dotted box) 
>> > > makes it look like a default ignorable. If it isn't a default 
>> > > ignorable, what should the visible rendering look like?
>> > >
>> > > OBJECT REPLACEMENT CHARACTER
>> > >
> > 
> > This character got encoded as *invisible* anchor character for inline 
> > objects. It's called "replacement" only since that was the easiest dodge 
> > around the name police at the time, and FFFD was adjacent and called a 
> > REPLACEMENT CHARACTER. Therefore, if U+FFFC isn't default ignorable, it 
> > probably should be, or treated the nearest thing to default ignorable.
> > 
> > Most rich text systems drop inline graphics without hint when exporting 
> > plain text. U+FFFC was not intended (at the time) to change that, only 
> > to allow a unique code to exist in the (internal) text buffer on to 
> > which to hang character formatting in a regular way.
> > 
> > As defined, the character *must not* get a default visible rendering.
> > 
> > If implementations wanted to export it, the way that receivers dealt 
> > with such data was not specified in the standard. That was somewhat 
> > deliberate; such practice simply was not encouraged.
> > 
> > Later, Unicode invented non-characters for similar needs - had the need 
> > for an OBJECT REPLACEMENT CHARACTER been raised after that point, it 
> > would have most likely been suggested to use a non-character, keeping it 
> > officially out of the recommended set of character codes for 
> > interchange. As it was, it was the first one where people realized that 
> > they needed a character code that was guaranteed not any other (real) 
> > character.
> > 
> > A./
=======================================================================