Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Jun 04 2004 - 20:11:26 CDT

Next message: Simon Montagu: "Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"

Previous message: Asmus Freytag: "Re: Revised Phoenician proposal"
Maybe in reply to: Peter Kirk: "Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"
Next in thread: Peter Kirk: "Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"
Reply: Peter Kirk: "Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter,

> >>>>There is no consensus that this Phoenician proposal is necessary. I
> >>>>and others have also put forward several mediating positions e.g.
> >>>>separate encoding with compatibility decompositions
> >>>>
> >>>Which was rejected by Ken for good technical reasons.
> >>>
> >>I don't remember any technical reasons, it was more a matter of "we
> >>haven't done it this way before".
> >
> >The *reason* why we haven't done it this way before is because
> >it would cause technical difficulties.
>
> I am revisiting this one because I realise now that Ken has been
> somewhat economical with the truth here.

No more economical than Peter has apparently been with good sense.

> There ARE cases in which entire
> alphabets have been given compatibility decompositions to other
> alphabets.
^^^^^^^^^

The operative word here is alphabets, as should be obvious. These
are not separate scripts. It they *had* been treated as separate
scripts, they would have been named differently and would *not*
have gotten compatibility decompositions.

> For example there are the Mathematical Alphanumeric Symbols,
> the Enclosed Alphanumerics, and the Fullwidth and Halfwidth Forms, as
> well as superscripts, subscripts, modifier letters etc. These symbols
> have these compatibility decompositions because they are not considered
> to form a separate script,

True.

> but rather to be glyph variants of characters
> in Latin, Greek, Katakana etc script.

Not a complete characterization. They are "presentation variants"
or other specifically styled versions of the alphabets,
encoded distinctly for one or another compatibility reason
presented in the history of the encoding. Nearly all of them
have a preexisting encoded (or named entity) existence that the
standard required mapping to.

That situation differs from random individual glyph variants
of characters, which do not get the same treatment.

> Do these compatibility
> decompositions cause technical difficulties?

In some contexts, yes.

>
> >Compatibility decompositions directly impact normalization.
>
> Of course. And the point of suggesting compatibility decomposition here
> is precisely so that compatibility normalisation, as well as default
> collation, folds together Phoenician and Hebrew variant glyphs of the
> same script.

I understood what you were trying to do. It was obvious, and there
was no need to repeat it.

What I was attempting to tell you is that there is no chance that
the UTC is going to start using compatibility decompositions as
a way to do folding *between* scripts.

If Phoenician (~ Old Canaanite) is separately encoded from Hebrew,
compatibility decompositions will *not* go in for folding them.

If Phoenician (~ Old Canaanite) is *not* separately encoded from
Hebrew, then there won't be any separately encoded Phoenician
characters to have compatibility decompositions for, and thus,
again, compatibility decompositions will *not* go in for folding
them.

And just as you claim my statement below begs the question
regarding the status of Phoenician, so does your characterization
of "Phoenician and Hebrew variant glyphs of the same script".

>
> >Cross-script equivalencing is done by transliteration algorithms,
> >not by normalization algorithms.

>
> This begs the question. Scholars of Semitic languages do not accept that
                          ^^^^^^^^
Recte: Some scholars

> this is a cross-script issue. They do not accept that representation of
                                ^^^^
Recte: Some scholars of Semitic languages

> a Phoenician, palaeo-Hebrew etc inscription with square Hebrew glyphs is
> transliteration. Rather, for them it is a matter of replacing an
> obsolete or non-standard glyph by a modern standard glyph for the same
> character - just as one would not describe as transliteration
              ^^^^^^^
Bogus analogy alert.

> representation in Times New Roman of a Latin script text in mediaeval
> handwriting or in Fraktur.

This begs the question.

> I would suggest that a clear distinction should be made, in an
> appropriate part of the Unicode Standard, between transliteration
> (between separate scripts) and what one might call glyph normalisation
> (between variant forms of the same script).

No one is going to call that "glyph normalization", as that would
just further muddy the waters, rather than clarifying anything.

In my opinion the standard is already quite clear about this,
and users of the standard (implementing text rendering,
text normalization, text transliteration) have had no particular
problem understanding what they were supposed to be doing.

The issue is simply the difficulty of coming to consensus for
certain archaic collections of writing systems, what constitutes
an encodable script boundary and what does not.

And that, my friend, was obvious a *MONTH* ago in this
dicussion.

> >>However, I can make
> >>a good case for the new Coptic letters being made compatibility
> >>equivalent to Greek - which can still be done, presumably -

> >
> >But will not be done. ...
>
> This is not intended as a transliteration solution. It is intended to
> recognise that *some* Coptic letters are glyph variants of Greek
> letters, as previously recognised by the UTC, whereas *others* are not.
> As a result only the former set would have compatibility decompositions
> - and as it happens those are precisely the ones which are proposed for
> new encoding, and so for which compatibility decompositions can still be
> defined. This also has the major advantage that it folds together, for
> normalisation and default collation, texts which have been encoded
> according to the existing definitions for Coptic and those which will be
> encoded according to the new definitions.

At the risk of sounding like a Michael Everson clone:

But will not be done.

I don't know why you persist in making suggestions that have zero
chance of obtaining any consensus for them in the UTC -- trying to
misapply mechanisms that were designed for other things and
which are constrained by how they are currently implemented
and related to normalization, for completely irrelevant other
kinds of equivalences. It isn't going to happen, Peter.

> But I accept that this Coptic to Greek compatibility has a few problems
> because not all characters have mappings. However, this is not a problem
> for Phoenician, because *every* Phoenician character has an unambiguous
> compatibility mapping to an existing Hebrew character.

No Phoenician encoded character has a *compatibility* mapping
to an existing Hebrew character until the UTC says that it
does.

The mappings are unambiguous and obvious, I grant you.

But the chance that those mappings will be instantiated as compability
decomposition mappings in the standard is zero.

> >I don't like the notion of interleaving in the default weighting
> >table, and have spoken against it, but as John Cowan has pointed
> >out, it is at least feasible. It doesn't have the ridiculousness
> >factor of the compatibility decomposition approach.

>
> If what I have suggested is ridiculous, so is what the UTC has already
^^^^^
Bogus analogy alert.

> defined for Mathematical Alphanumeric Symbols.
>
> >...
> >The equivalencing of 22 Phoenician letters, one-to-one against
> >Hebrew characters, where the mapping is completely known and
> >uncontroversial, is a minor molehill.
>
> Well, why not make these uncontroversial equivalents, between variant
> glyphs for the same script, compatibility decompositions?

Well, if you don't understand the technical issues, Peter, how
about this for a reason why not:

Because even if you came personally to the UTC and argued the
case for your position at the meeting, I predict that your
proposal would be turned down by a 0 For, 12 Against vote.

Of course, if you also brought along the Patron Saint of
Lost Causes to help you, the vote might turn out 1 For, 11 Against.

Now I've really had my fill of rehashing and regurgitation on
Phoenician. If anybody wants to have any *actual* impact on
the encoding decisions, I would suggest they finish writing
up and submitting formal documents for the UTC discussion,
instead of spending the weekend boring the rest of this list
with a further commodious vicus of recirculation....

--Ken

Next message: Simon Montagu: "Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"
Previous message: Asmus Freytag: "Re: Revised Phoenician proposal"
Maybe in reply to: Peter Kirk: "Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"
Next in thread: Peter Kirk: "Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"
Reply: Peter Kirk: "Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jun 04 2004 - 20:12:15 CDT