From: Peter Kirk (email@example.com)
Date: Fri Jun 04 2004 - 16:21:52 CDT
On 25/05/2004 12:14, Kenneth Whistler wrote:
>>>>There is no consensus that this Phoenician proposal is necessary. I
>>>>and others have also put forward several mediating positions e.g.
>>>>separate encoding with compatibility decompositions
>>>Which was rejected by Ken for good technical reasons.
>>I don't remember any technical reasons, it was more a matter of "we
>>haven't done it this way before".
>The *reason* why we haven't done it this way before is because
>it would cause technical difficulties.
I am revisiting this one because I realise now that Ken has been
somewhat economical with the truth here. There ARE cases in which entire
alphabets have been given compatibility decompositions to other
alphabets. For example there are the Mathematical Alphanumeric Symbols,
the Enclosed Alphanumerics, and the Fullwidth and Halfwidth Forms, as
well as superscripts, subscripts, modifier letters etc. These symbols
have these compatibility decompositions because they are not considered
to form a separate script, but rather to be glyph variants of characters
in Latin, Greek, Katakana etc script. Do these compatibility
decompositions cause technical difficulties?
>Compatibility decompositions directly impact normalization.
Of course. And the point of suggesting compatibility decomposition here
is precisely so that compatibility normalisation, as well as default
collation, folds together Phoenician and Hebrew variant glyphs of the
>Cross-script equivalencing is done by transliteration algorithms,
>not by normalization algorithms.
This begs the question. Scholars of Semitic languages do not accept that
this is a cross-script issue. They do not accept that representation of
a Phoenician, palaeo-Hebrew etc inscription with square Hebrew glyphs is
transliteration. Rather, for them it is a matter of replacing an
obsolete or non-standard glyph by a modern standard glyph for the same
character - just as one would not describe as transliteration
representation in Times New Roman of a Latin script text in mediaeval
handwriting or in Fraktur.
>If you try to blur the boundary between those two by introducing
>compatibility decompositions to equate across separately encoded
>scripts, the net impact would be to screw up *both* normalization
>and transliteration by conflating the two. You
>would end up with confusion among both the implementers of
>such algorithms and the consumers of them.
I would suggest that a clear distinction should be made, in an
appropriate part of the Unicode Standard, between transliteration
(between separate scripts) and what one might call glyph normalisation
(between variant forms of the same script).
>>But perhaps that is only because the
>>need to do this has not previously been identified.
>No, that is not the case.
>>However, I can make
>>a good case for the new Coptic letters being made compatibility
>>equivalent to Greek - which can still be done, presumably -
>But will not be done. If you attempted to make your case, you
>would soon discover that even *if* such cross-script equivalencing
>via compatibility decompositions were a good idea (which it isn't),
>you would end up with inconsistencies, because some of the Coptic
>letters would have decompositions and some could not (because they
>are already in the standard without decompositions). You'd end
>up with a normalization nightmare (where some normalization
>forms would fold Coptic and Greek, and other normalization
>forms would not), while not having a transliteration solution.
This is not intended as a transliteration solution. It is intended to
recognise that *some* Coptic letters are glyph variants of Greek
letters, as previously recognised by the UTC, whereas *others* are not.
As a result only the former set would have compatibility decompositions
- and as it happens those are precisely the ones which are proposed for
new encoding, and so for which compatibility decompositions can still be
defined. This also has the major advantage that it folds together, for
normalisation and default collation, texts which have been encoded
according to the existing definitions for Coptic and those which will be
encoded according to the new definitions.
But I accept that this Coptic to Greek compatibility has a few problems
because not all characters have mappings. However, this is not a problem
for Phoenician, because *every* Phoenician character has an unambiguous
compatibility mapping to an existing Hebrew character.
>I don't like the notion of interleaving in the default weighting
>table, and have spoken against it, but as John Cowan has pointed
>out, it is at least feasible. It doesn't have the ridiculousness
>factor of the compatibility decomposition approach.
If what I have suggested is ridiculous, so is what the UTC has already
defined for Mathematical Alphanumeric Symbols.
>The equivalencing of 22 Phoenician letters, one-to-one against
>Hebrew characters, where the mapping is completely known and
>uncontroversial, is a minor molehill.
Well, why not make these uncontroversial equivalents, between variant
glyphs for the same script, compatibility decompositions?
-- Peter Kirk firstname.lastname@example.org (personal) email@example.com (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Fri Jun 04 2004 - 16:22:58 CDT