Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

From: Kenneth Whistler (
Date: Fri Jun 04 2004 - 20:11:26 CDT

  • Next message: Simon Montagu: "Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"


    > >>>>There is no consensus that this Phoenician proposal is necessary. I
    > >>>>and others have also put forward several mediating positions e.g.
    > >>>>separate encoding with compatibility decompositions
    > >>>>
    > >>>Which was rejected by Ken for good technical reasons.
    > >>>
    > >>I don't remember any technical reasons, it was more a matter of "we
    > >>haven't done it this way before".
    > >
    > >The *reason* why we haven't done it this way before is because
    > >it would cause technical difficulties.
    > I am revisiting this one because I realise now that Ken has been
    > somewhat economical with the truth here.

    No more economical than Peter has apparently been with good sense.

    > There ARE cases in which entire
    > alphabets have been given compatibility decompositions to other
    > alphabets.
    The operative word here is alphabets, as should be obvious. These
    are not separate scripts. It they *had* been treated as separate
    scripts, they would have been named differently and would *not*
    have gotten compatibility decompositions.

    > For example there are the Mathematical Alphanumeric Symbols,
    > the Enclosed Alphanumerics, and the Fullwidth and Halfwidth Forms, as
    > well as superscripts, subscripts, modifier letters etc. These symbols
    > have these compatibility decompositions because they are not considered
    > to form a separate script,


    > but rather to be glyph variants of characters
    > in Latin, Greek, Katakana etc script.

    Not a complete characterization. They are "presentation variants"
    or other specifically styled versions of the alphabets,
    encoded distinctly for one or another compatibility reason
    presented in the history of the encoding. Nearly all of them
    have a preexisting encoded (or named entity) existence that the
    standard required mapping to.

    That situation differs from random individual glyph variants
    of characters, which do not get the same treatment.

    > Do these compatibility
    > decompositions cause technical difficulties?

    In some contexts, yes.

    > >Compatibility decompositions directly impact normalization.
    > Of course. And the point of suggesting compatibility decomposition here
    > is precisely so that compatibility normalisation, as well as default
    > collation, folds together Phoenician and Hebrew variant glyphs of the
    > same script.

    I understood what you were trying to do. It was obvious, and there
    was no need to repeat it.

    What I was attempting to tell you is that there is no chance that
    the UTC is going to start using compatibility decompositions as
    a way to do folding *between* scripts.

    If Phoenician (~ Old Canaanite) is separately encoded from Hebrew,
    compatibility decompositions will *not* go in for folding them.

    If Phoenician (~ Old Canaanite) is *not* separately encoded from
    Hebrew, then there won't be any separately encoded Phoenician
    characters to have compatibility decompositions for, and thus,
    again, compatibility decompositions will *not* go in for folding

    And just as you claim my statement below begs the question
    regarding the status of Phoenician, so does your characterization
    of "Phoenician and Hebrew variant glyphs of the same script".

    > >Cross-script equivalencing is done by transliteration algorithms,
    > >not by normalization algorithms.

    > This begs the question. Scholars of Semitic languages do not accept that
    Recte: Some scholars
    > this is a cross-script issue. They do not accept that representation of
    Recte: Some scholars of Semitic languages
    > a Phoenician, palaeo-Hebrew etc inscription with square Hebrew glyphs is
    > transliteration. Rather, for them it is a matter of replacing an
    > obsolete or non-standard glyph by a modern standard glyph for the same
    > character - just as one would not describe as transliteration
    Bogus analogy alert.
    > representation in Times New Roman of a Latin script text in mediaeval
    > handwriting or in Fraktur.

    This begs the question.

    > I would suggest that a clear distinction should be made, in an
    > appropriate part of the Unicode Standard, between transliteration
    > (between separate scripts) and what one might call glyph normalisation
    > (between variant forms of the same script).

    No one is going to call that "glyph normalization", as that would
    just further muddy the waters, rather than clarifying anything.

    In my opinion the standard is already quite clear about this,
    and users of the standard (implementing text rendering,
    text normalization, text transliteration) have had no particular
    problem understanding what they were supposed to be doing.

    The issue is simply the difficulty of coming to consensus for
    certain archaic collections of writing systems, what constitutes
    an encodable script boundary and what does not.

    And that, my friend, was obvious a *MONTH* ago in this

    > >>However, I can make
    > >>a good case for the new Coptic letters being made compatibility
    > >>equivalent to Greek - which can still be done, presumably -

    > >
    > >But will not be done. ...
    > This is not intended as a transliteration solution. It is intended to
    > recognise that *some* Coptic letters are glyph variants of Greek
    > letters, as previously recognised by the UTC, whereas *others* are not.
    > As a result only the former set would have compatibility decompositions
    > - and as it happens those are precisely the ones which are proposed for
    > new encoding, and so for which compatibility decompositions can still be
    > defined. This also has the major advantage that it folds together, for
    > normalisation and default collation, texts which have been encoded
    > according to the existing definitions for Coptic and those which will be
    > encoded according to the new definitions.

    At the risk of sounding like a Michael Everson clone:

    But will not be done.

    I don't know why you persist in making suggestions that have zero
    chance of obtaining any consensus for them in the UTC -- trying to
    misapply mechanisms that were designed for other things and
    which are constrained by how they are currently implemented
    and related to normalization, for completely irrelevant other
    kinds of equivalences. It isn't going to happen, Peter.

    > But I accept that this Coptic to Greek compatibility has a few problems
    > because not all characters have mappings. However, this is not a problem
    > for Phoenician, because *every* Phoenician character has an unambiguous
    > compatibility mapping to an existing Hebrew character.

    No Phoenician encoded character has a *compatibility* mapping
    to an existing Hebrew character until the UTC says that it

    The mappings are unambiguous and obvious, I grant you.

    But the chance that those mappings will be instantiated as compability
    decomposition mappings in the standard is zero.

    > >I don't like the notion of interleaving in the default weighting
    > >table, and have spoken against it, but as John Cowan has pointed
    > >out, it is at least feasible. It doesn't have the ridiculousness
    > >factor of the compatibility decomposition approach.

    > If what I have suggested is ridiculous, so is what the UTC has already
    Bogus analogy alert.
    > defined for Mathematical Alphanumeric Symbols.
    > >...
    > >The equivalencing of 22 Phoenician letters, one-to-one against
    > >Hebrew characters, where the mapping is completely known and
    > >uncontroversial, is a minor molehill.
    > Well, why not make these uncontroversial equivalents, between variant
    > glyphs for the same script, compatibility decompositions?

    Well, if you don't understand the technical issues, Peter, how
    about this for a reason why not:

    Because even if you came personally to the UTC and argued the
    case for your position at the meeting, I predict that your
    proposal would be turned down by a 0 For, 12 Against vote.

    Of course, if you also brought along the Patron Saint of
    Lost Causes to help you, the vote might turn out 1 For, 11 Against.

    Now I've really had my fill of rehashing and regurgitation on
    Phoenician. If anybody wants to have any *actual* impact on
    the encoding decisions, I would suggest they finish writing
    up and submitting formal documents for the UTC discussion,
    instead of spending the weekend boring the rest of this list
    with a further commodious vicus of recirculation....


    This archive was generated by hypermail 2.1.5 : Fri Jun 04 2004 - 20:12:15 CDT