From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Wed Mar 02 2005 - 17:40:03 CST
At 03:04 PM 3/2/2005, Peter Kirk wrote:
>Doug Ewell's definition of stability that "it does not change in a way
>that causes existing implementations or data to break".
As stated, that definition is clearly nonsense.
By assumption, existing implementations correctly handle only existing
data. New data will always be able to contain characters at hitherto
unassigned positions.
It is always possible for (badly written) existing implementations to
'break' when exposed to new data. However, there is some predictability to
allow more forward compatibility: some ranges for default-ignorable
characters include unallocated code positions so that it is possible for
old implementations to have ignored a range and therefore be able to ignore
'future' ignorables.
More importantly, a new implementations must (be able to) act on existing
data the same way old implementations did. That precludes moving a
character, otherwise new implementations would apply the new definition and
mis-interpret old data.
It also precludes (in principle) a change in definition of a character such
that some interpretations of that character are no longer supported. So the
cleanest way for a disunification would always be to add a *pair* of
characters. (Ken's HYPHEN-MINUS, HYPHEN and MINUS example, or my AB + A + B
example).
However, there are cases where that's a foolish consistency. The cleanest
case is the use of standard Greek letters for Coptic. In principle, we
would have needed three alphabets: the existing characters (for potentially
ambiguous mixed Greek/Coptic use as defined in Unicode 1.0 through 4.0),
the new Coptic characters (as drafted for Unicode 4.1) and a new set of
unambiguous new Greek characters so that there is absolutely no possibility
that these 'might' be Coptic.
In practice that would not have worked. All the mappings are to the
existing Greek characters. Users of Greek would have simply continued to
use the 'ambiguous' ones, which everyone treats as 'Greek' by default
anyway. All the new characters would have done is to create potential
alternate spellings.
It is therefore better to just add the Coptic, which allows users of that
script the desired unambiguous representation of their script. Greek users
do not need to change, and to the extent that there is exiting Coptic data
using the mixed model it would continue to be supported - under the same
restrictions as before, i.e. with use of a Coptic-specific font. (In other
word, this is the AB + B or AB + A example in terms of my earlier message).
This works because Greek use of the existing Greek letters is the
overwhelming majority of all use, and had been the de-facto default
interpretation. The occasional use of Greek code points with a Coptic font
has never been a practical problem for Greek users, so that use can
continue if needed for the support of existing data.
In contrast. in the case of the HYPHEN-MINUS on the other hand, both
interpretations are equally likely (or nearly so), therefore adding just
one other character would have incorrectly forced a single default
interpretation on the character. Clearly, this is a case where being able
to explicitly distinguish the ambiguous case is useful and desired, so this
was correctly implemented as an AB + A + B case.
A./
This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 17:40:52 CST