From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Mar 02 2005 - 14:29:31 CST
Peter Kirk said:
> The character is indeed ambiguous in Unicode 4.0. But it must no longer
> be ambiguous in Unicode 4.1 or 5.0, because otherwise that would leave
> two equally valid ways of spelling the same word.
This is a fundamental misunderstanding. The standard does not work
that way, and disunifications create no such effect or requirement.
This is why Peter Constable has been claiming (correctly) that
the particular disunification(s) in question do not invalidate
any existing data. And failing to understand that seems to be
leading to this utterly nonconverging argumentation that plagues the
list with revisiting the same claims over and over and over again.
I'll illustrate with an example that is much more accessible to
most of the list participants than the arcana of Holam Haser and
Holam Male.
U+002D HYPHEN-MINUS is fundamentally ambiguous. It was used
ambiguously in ASCII for years before it ever made it into
ISO 8859-1 and hence into Unicode. The Unicode Standard introduced
a disambiguating disunification, by adding U+2010 HYPHEN (a dash
punctuation) and U+2212 MINUS SIGN (a math symbol and operator).
The existence of U+2010 and U+2212 in the Unicode Standard does
not invalidate existing ASCII-based data using U+002D. They do
not create any obligation on users to spell things unambiguously
with the new characters and to eschew the ambiguous U+002D. It
merely allows them to, if they choose to do so and have the
proper contexts and tools to make use of them.
The existence of U+2212 MINUS SIGN results in "two equally valid
ways of spelling" "-2", for instance. In some contexts, as in
the C programming languages, for example, only one of those
can actually be used, namely <002D, 0032>, and that is using
the *ambiguous* character, because only U+002D is allowed in
the formal syntax for expressions, and not U+2212. Some other
contexts, such as formal algebraic systems and mathematical
formula layout engines, may support both and distinguish clearly
between them, even to the point of using different layout rules
and/or glyphs for them. That is an implementation decision.
What the Unicode Standard does not do and never *will* do,
is deprecate U+002D HYPHEN-MINUS for its intended (ambiguous)
usage, despite the fact that there are separately encoded
characters for HYPHEN and for MINUS SIGN. It should be utterly
obvious why the Unicode Standard will not do that: such a
change would be completely destabilizing.
Now take the discussion about Holams and plug them in at appropriate
places for HYPHEN-MINUS and MINUS SIGN, and you end up with essentially
the same situation -- only involving a much more obscure case in
Hebrew instead of an obvious case in ASCII.
And no, Dean, this is not an invitation to come re-argue the
case that any disambiguating disunification should (or must)
encode *both* of the disambiguated usages. It matters not
whether a disunification proceeds as:
X (:: A or B) ==> Y (:: A)
or as:
X (:: A or B) ==> Y (:: A)
==> Z (:: B)
In *either* case, you *still* are left with an X encoded, ambiguous
between meaning "A" or "B". And data that makes use of that X,
whether generated before *or* after the historical point that
the disambiguating disunification decision was taken, may still
be ambiguous in exactly the same way it was before such additions.
> At the very least the
> old representation, in this case Holam Haser on VAV represented as
> HOLAM, should be clearly deprecated as an obsolete spelling no longer to
> be supported.
No it should not. Claiming that it should be is precisely what
is leading Jony to keep coming back to the list claiming that
this change represents a destabilization of existing Hebrew data.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Mar 02 2005 - 14:31:21 CST