Re: Unicode Stability

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Mar 02 2005 - 14:34:42 CST

Next message: Tom Emerson: "Re: teh marbuta"

Previous message: Kenneth Whistler: "Ambiguity and disunification (was: a commodious vicus of Hebrew recirculation from: Re: Unicode Stability)"
In reply to: Peter Kirk: "Re: Unicode Stability"
Next in thread: Peter Kirk: "Re: Unicode Stability"
Reply: Peter Kirk: "Re: Unicode Stability"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

It's time to bring a bit more systematic treatment to the
discussion of stability. Here's a rundown:

The problem:

A character AB is used ambiguously to represent both A and B

Possible scenarios:

- Do nothing

   Everybody uses AB as before. Users and software must rely
   on context to distinguish A and B. If context does not
   allow for reliable distinction, but the shape of A and B
   are different, software will fail to meet users' expectation.

- Add both A *and* B as new characters

   Existing data uses AB. Users can use AB if they want to
   represent absence of information (i.e. if the user can't
   decide whether A or B is intended in the text when typing
   it in). New data can use A and B. However, there will be
   a transition period where neither A nor B are supported
   by fonts and software. (This is true for all new character
   additions). New software will need to add explicit support
   to map A and B to AB for searches. Old software will not
   support matching A and B with AB in searches.

- Add B as a new character

   Existing data uses AB. New software will assume it's A.
   Where the shape of an 'A' matches that of an AB, old data
   will display as before.
   If the user can't decide whether A or B is intended in
   the text when typing it in, AB should be used by default.
   In principle, the use of AB no longer indicates an am-
   biguous situation. However, if the contexts would not
   allow a 'B', software could consider this as an indication
   that *old* data is present, and treat AB as a B.
   New data can use B to unambiguously mark a usage as B.
   However, there will be a transition period where B is not
   yet supported by fonts and software. (This is true for
   all new character additions). New software will need to
   add explicit support to map B to AB for searches. Old
   software will not support matching B with AB in searches.

- Add A as a new character

   The same. The only difference between these two is which
   interpretation of AB is considered the default. If one
   of the two shapes A or B is vastly more prevalent, or
   if the use of either shape is a permissible fall-back,
   that shape would make the better default.

- Add a variation selector for B

   Same as adding B as a character, except that all software
   could elect to ignore the variation selector, treating
   all instances of AB as ambiguous. Some software ignores
   all variation selectors. For example in searching and
   sorting. Such software will not be able to make the
   distinction between AB and B. In other words, the 'fix'
   is limited to display and rendering. Software or fonts
   that rely on the presence of the variation selector, may
   not display AB the same as old software did for old data.
   (See discussion above).
   There will be a transition period until all software can
   either handle or ignore the variation selector, until
   then, text with the variation selector may display
   incorrectly, appear broken, or may result in processing
   problems.

- Add a variation selector for A

The same.

- Add a variation selector for AB

   Generally the same as adding a variation selector for
   either A or B, but explicitly supports the use of
   a standalone AB as ambiguous.

Stability evaluation:

This is the complete matrix. In all cases there are circum-
stances where the software violates user's expectations.
This is true even for the 'do nothing' case, since by
definition, the use of AB for both A and B is considered
a problem. Otherwise we would not look for a solution.

However, there are important differences between the solutions
that need to be considered. They affect the stability of
software and the stability of data in different ways.

One of the working assumptions of Unicode is that data are
forever. Once data exists in a particular form, it is expected
that software will continue to handle it. On the other hand,
software is expected to undergo regular updating. That is
already needed to handle additions to the Unicode standard,
as well as other technological changes that are not affected
by the particular issue (disunification of AB).

In all scenarios, old software will continue to handle old
date as before. In no scenario will old software handle new
data without problems. In the future, if variation selectors
were correctly implemented by *all* software, the default
processing of variation selectors might allow 'old' software
to handle new data as if it was old data in some of the
scenarios. That, however, is not the state of the art.

New software will handle old data as before, except in those
scenarios where only one variation sequence or one new character
is added. The assumption is that such a scenario would be
selected only if the shape used for AB would remain unchanged.
Under that assumption, new software would continue to display
old data as before.

The fact that new software and new fonts will be needed to
support any of these scenarios, other than 'do nothing' is
a temporary issue. It is no different than for any other
addition of characters.

Role of variation selectors:

Semantically, variation selectors are intended to act as if
they didn't exist. In other words, processes that act on
the content of the data are assumed to ignore variation
selectors, whereas processes acting on the appearance of
data are supposed to take variation selectors into account.

Because of this, where a distinction in content is desired,
the encoding of new characters should be considered instead
of the addition of variation selectors. Where the issue is
one of mere appearance, variation sequences can be an
option.

In other words, variation selectors should not be used to
encode optional semantic differences, but only optional
glyphic differences.

A./

Next message: Tom Emerson: "Re: teh marbuta"
Previous message: Kenneth Whistler: "Ambiguity and disunification (was: a commodious vicus of Hebrew recirculation from: Re: Unicode Stability)"
In reply to: Peter Kirk: "Re: Unicode Stability"
Next in thread: Peter Kirk: "Re: Unicode Stability"
Reply: Peter Kirk: "Re: Unicode Stability"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 14:36:27 CST