From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Mar 02 2005 - 14:34:42 CST
It's time to bring a bit more systematic treatment to the
discussion of stability. Here's a rundown:
The problem:
A character AB is used ambiguously to represent both A and B
Possible scenarios:
- Do nothing
Everybody uses AB as before. Users and software must rely
on context to distinguish A and B. If context does not
allow for reliable distinction, but the shape of A and B
are different, software will fail to meet users' expectation.
- Add both A *and* B as new characters
Existing data uses AB. Users can use AB if they want to
represent absence of information (i.e. if the user can't
decide whether A or B is intended in the text when typing
it in). New data can use A and B. However, there will be
a transition period where neither A nor B are supported
by fonts and software. (This is true for all new character
additions). New software will need to add explicit support
to map A and B to AB for searches. Old software will not
support matching A and B with AB in searches.
- Add B as a new character
Existing data uses AB. New software will assume it's A.
Where the shape of an 'A' matches that of an AB, old data
will display as before.
If the user can't decide whether A or B is intended in
the text when typing it in, AB should be used by default.
In principle, the use of AB no longer indicates an am-
biguous situation. However, if the contexts would not
allow a 'B', software could consider this as an indication
that *old* data is present, and treat AB as a B.
New data can use B to unambiguously mark a usage as B.
However, there will be a transition period where B is not
yet supported by fonts and software. (This is true for
all new character additions). New software will need to
add explicit support to map B to AB for searches. Old
software will not support matching B with AB in searches.
- Add A as a new character
The same. The only difference between these two is which
interpretation of AB is considered the default. If one
of the two shapes A or B is vastly more prevalent, or
if the use of either shape is a permissible fall-back,
that shape would make the better default.
- Add a variation selector for B
Same as adding B as a character, except that all software
could elect to ignore the variation selector, treating
all instances of AB as ambiguous. Some software ignores
all variation selectors. For example in searching and
sorting. Such software will not be able to make the
distinction between AB and B. In other words, the 'fix'
is limited to display and rendering. Software or fonts
that rely on the presence of the variation selector, may
not display AB the same as old software did for old data.
(See discussion above).
There will be a transition period until all software can
either handle or ignore the variation selector, until
then, text with the variation selector may display
incorrectly, appear broken, or may result in processing
problems.
- Add a variation selector for A
The same.
- Add a variation selector for AB
Generally the same as adding a variation selector for
either A or B, but explicitly supports the use of
a standalone AB as ambiguous.
Stability evaluation:
This is the complete matrix. In all cases there are circum-
stances where the software violates user's expectations.
This is true even for the 'do nothing' case, since by
definition, the use of AB for both A and B is considered
a problem. Otherwise we would not look for a solution.
However, there are important differences between the solutions
that need to be considered. They affect the stability of
software and the stability of data in different ways.
One of the working assumptions of Unicode is that data are
forever. Once data exists in a particular form, it is expected
that software will continue to handle it. On the other hand,
software is expected to undergo regular updating. That is
already needed to handle additions to the Unicode standard,
as well as other technological changes that are not affected
by the particular issue (disunification of AB).
In all scenarios, old software will continue to handle old
date as before. In no scenario will old software handle new
data without problems. In the future, if variation selectors
were correctly implemented by *all* software, the default
processing of variation selectors might allow 'old' software
to handle new data as if it was old data in some of the
scenarios. That, however, is not the state of the art.
New software will handle old data as before, except in those
scenarios where only one variation sequence or one new character
is added. The assumption is that such a scenario would be
selected only if the shape used for AB would remain unchanged.
Under that assumption, new software would continue to display
old data as before.
The fact that new software and new fonts will be needed to
support any of these scenarios, other than 'do nothing' is
a temporary issue. It is no different than for any other
addition of characters.
Role of variation selectors:
Semantically, variation selectors are intended to act as if
they didn't exist. In other words, processes that act on
the content of the data are assumed to ignore variation
selectors, whereas processes acting on the appearance of
data are supposed to take variation selectors into account.
Because of this, where a distinction in content is desired,
the encoding of new characters should be considered instead
of the addition of variation selectors. Where the issue is
one of mere appearance, variation sequences can be an
option.
In other words, variation selectors should not be used to
encode optional semantic differences, but only optional
glyphic differences.
A./
This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 14:36:27 CST