From: Kenneth Whistler (firstname.lastname@example.org)
Date: Tue May 25 2004 - 14:14:38 CDT
> >> There is no consensus that this Phoenician proposal is necessary. I
> >> and others have also put forward several mediating positions e.g.
> >> separate encoding with compatibility decompositions
> > Which was rejected by Ken for good technical reasons.
> I don't remember any technical reasons, it was more a matter of "we
> haven't done it this way before".
The *reason* why we haven't done it this way before is because
it would cause technical difficulties.
Compatibility decompositions directly impact normalization.
Cross-script equivalencing is done by transliteration algorithms,
not by normalization algorithms.
If you try to blur the boundary between those two by introducing
compatibility decompositions to equate across separately encoded
scripts, the net impact would be to screw up *both* normalization
and transliteration by conflating the two. You
would end up with confusion among both the implementers of
such algorithms and the consumers of them.
> But perhaps that is only because the
> need to do this has not previously been identified.
No, that is not the case.
> However, I can make
> a good case for the new Coptic letters being made compatibility
> equivalent to Greek - which can still be done, presumably -
But will not be done. If you attempted to make your case, you
would soon discover that even *if* such cross-script equivalencing
via compatibility decompositions were a good idea (which it isn't),
you would end up with inconsistencies, because some of the Coptic
letters would have decompositions and some could not (because they
are already in the standard without decompositions). You'd end
up with a normalization nightmare (where some normalization
forms would fold Coptic and Greek, and other normalization
forms would not), while not having a transliteration solution.
The UTC would, I predict, reject such a proposal out of hand.
> as well as
> for similar equivalences for scripts like Gothic and Old Italic, and
> perhaps Indic scripts - which presumably cannot now be added for
> stability reasons.
> >> and with interleaved collation,
> > Which was rejected for the default template (and would go against the
> > practices already in place in the default template) but is available
> > to you in your tailorings.
> Again, a matter of "we haven't done it this way before".
I don't like the notion of interleaving in the default weighting
table, and have spoken against it, but as John Cowan has pointed
out, it is at least feasible. It doesn't have the ridiculousness
factor of the compatibility decomposition approach.
> >> also encoding as variation sequences,
> > Which was rejected by Ken and others for good technical reasons, not
> > the least of which was the p%r%e%p%o%s%t%e%r%o%u%s%n%e%s%s% of
> > interleaving Hebrew text in order to get Phoenician glyphs.
> I don't like this one myself either.
So can we please just drop it?
> But I disagree on
> *preposterousness*. You consider this preposterous because you
> presuppose that these are entirely different scripts. Others consider it
> preposterous *not* to interleave Phoenician and Hebrew because they
> understand these to be glyph variants of the same script. For, as John
> Hudson has put it so clearly, for these people Phoenician and Hebrew
> letters are the same abstract characters, in different representations.
This is just restating the basic disagreement, for the umpteenth time.
> It is clear to me that Phoenician is *not* an entirely separate script.
> It seems to me that it comes somewhere between being the same script and
> being a separate one. (In other words, I don't entirely accept either of
> the strong traditions of scholarship.) Therefore complete separation is
> inappropriate, although I don't insist on complete unification.
O.k., so far, so good...
> So I am
> looking for a technical solution which comes somewhere between these two
> extremes, which officially recognises the one-to-one equivalence between
> Phoenician and (a subset of) Hebrew while making a plain text
> distinction possible for those who wish to make it.
The technical solution for that is:
A. Encode Phoenician as a separate script. (That accomplishes the
second task, of making a plain text distinction possible.)
B. Asserting in the *documentation* that there is a well-known
one-to-one equivalence relationship between the letters of
this (and other 22CWSA) and Hebrew letters -- including the
publication of the mapping tables as proof of concept.
People (up to and including OS manufacturers, if they so choose), can
then make use of B in developing collation tables, search algorithms,
transliterations, or other kinds of equivalencing.
Where I get off, however, is in assuming that the recognition of
an equivalence has to be *further* baked into some normative
mechanism of the Unicode Standard itself. Attempting to force this
into normative behavior via compatibility decompositions or
variation sequences is likely to result in *worse* consequences,
in my opinion. The proper way forward is simply an assertion
(and publication) of appropriate equivalence relationships
for particular fields, and then getting on with the task of
> > The technical solutions you have proposed have been inadequate.
> Can you suggest one which is more adequate? Or in fact are you
> determined to reject any solution, using doubtful technical arguments
> against the details because you have failed to produce convincing
> arguments against the principle?
Michael is correct. But don't expect *him* to provide you with
all the nitty-gritty dirt from *inside* the library, OS,
database, and application vendors' code, because he isn't that
level of implementer.
It would be more appropriate to direct such questions to the
people who actually write and implement normalization, collation,
transliteration, folding, and other kinds of equivalencing
operations in shipping software.
> >... the issue of
> >whether the 22 basic Semitic letters can also be represented in
> >a Phoenician script or not pales to the minor molehill it actually
> >is, in my opinion.
> Obviously a lot of people disagree with you on this one, Ken.
Of course they do. Otherwise Dean wouldn't be harping on this
point over and over.
But I have *seen* mountains. The equivalencing problem for
Hangul is a significant foothill, at least. The equivalencing and
variation problem for Han is a genuine mountain range.
The equivalencing of 22 Phoenician letters, one-to-one against
Hebrew characters, where the mapping is completely known and
uncontroversial, is a minor molehill.
This archive was generated by hypermail 2.1.5 : Tue May 25 2004 - 14:18:16 CDT