Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

From: Peter Kirk (peterkirk@qaya.org)
Date: Sat Jun 05 2004 - 07:08:45 CDT

  • Next message: John Hudson: "Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?"

    On 04/06/2004 18:11, Kenneth Whistler wrote:

    > ...
    >
    >>There ARE cases in which entire
    >>alphabets have been given compatibility decompositions to other
    >>alphabets.
    >>
    >>
    > ^^^^^^^^^
    >
    >The operative word here is alphabets, as should be obvious. These
    >are not separate scripts. It they *had* been treated as separate
    >scripts, they would have been named differently and would *not*
    >have gotten compatibility decompositions.
    >
    >

    Well, we come back to the irreconcilable difference. You and Michael
    assert that Phoenician is a separate script from Hebrew. The scholars of
    Semitic writing who have written to this list or been quoted on it
    disagree, some in strong words, although some of them also agree with me
    that a mechanism for making plain text distinctions is also required.
    For example, Patrick Durusau, although in general terms he supports the
    proposal, wrote:

    > All Hudson is pointing out is that long PRIOR to Unicode, Semitic
    > scholars reached the conclusion all Semitic languages share the same
    > 22 characters. A long standing and quite useful conclusion that has
    > nothing at all to do with your proposal.

    But I dispute his last sentence. If the writing systems of these
    languages share the same abstract characters, they form a single script,
    which conflicts with the proposal to encode Phoenician as a separate script.

    >
    >
    >>For example there are the Mathematical Alphanumeric Symbols,
    >>the Enclosed Alphanumerics, and the Fullwidth and Halfwidth Forms, as
    >>well as superscripts, subscripts, modifier letters etc. These symbols
    >>have these compatibility decompositions because they are not considered
    >>to form a separate script,
    >>
    >>
    >
    >True.
    >
    >
    >
    >>but rather to be glyph variants of characters
    >>in Latin, Greek, Katakana etc script.
    >>
    >>
    >
    >Not a complete characterization. They are "presentation variants"
    >or other specifically styled versions of the alphabets,
    >encoded distinctly for one or another compatibility reason
    >presented in the history of the encoding. Nearly all of them
    >have a preexisting encoded (or named entity) existence that the
    >standard required mapping to.
    >
    >

    Well, the Mathematical Alphanumeric Symbols did not have a previous
    separate existence. And I am arguing that Phoenician and Hebrew are
    presentation variants of a single script, although not so specifically
    styled.

    >...
    >
    >If Phoenician (~ Old Canaanite) is separately encoded from Hebrew,
    >compatibility decompositions will *not* go in for folding them.
    >
    >If Phoenician (~ Old Canaanite) is *not* separately encoded from
    >Hebrew, then there won't be any separately encoded Phoenician
    >characters to have compatibility decompositions for, and thus,
    >again, compatibility decompositions will *not* go in for folding
    >them.
    >
    >

    This is in fact my preferred solution, because Phoenician or Old
    Canaanite is not a separate script and so should not be encoded as such.
    A mechanism is required for making a plain text distinction, but without
    defining a separate script. The Unicode standard recognised this situation:

    > Occasionally the need arises in text processing to restrict or change
    > the set of glyphs that are to be used to represent a character.
    > Normally such changes are indicated by choice of font or style in rich
    > text documents. In special circumstances, such a variation from the
    > normal range of appearance needs to be expressed side-by-side in the
    > same document in plain text contexts, where it is impossible or
    > inconvenient to exchange formatted text.

    That is a quote from TUS 4.0.1 section 15.6, p.397. In that section a
    mechanism, variation selectors, is defined for making such distinctions
    in plain text. You, Ken, have argued that this mechanism is
    inappropriate for a situation like this one. In that case, if we accept
    for the time being my premise that Phoenician is not a separate script
    but requires a plain text mechanism to distinguish it from Hebrew, there
    is a need for an alternative mechanism for such a situation.

    >And just as you claim my statement below begs the question
    >regarding the status of Phoenician, so does your characterization
    >of "Phoenician and Hebrew variant glyphs of the same script".
    >
    >
    >
    I accept that what I wrote depends on the understanding of Semitic
    scholars that these are the same script.

    >>>Cross-script equivalencing is done by transliteration algorithms,
    >>>not by normalization algorithms.
    >>>
    >>>
    >
    >
    >
    >>This begs the question. Scholars of Semitic languages do not accept that
    >>
    >>
    > ^^^^^^^^
    >Recte: Some scholars
    >
    >

    Well, I have not seen any scholars of Semitic languages state that
    Phoenician is in principle a separate script from Hebrew, although some
    have accepted the proposal, from a misunderstanding of the
    character-glyph model and because it would be convenient in practice for
    their work.

    >
    >
    >
    >>this is a cross-script issue. They do not accept that representation of
    >>
    >>
    > ^^^^
    >Recte: Some scholars of Semitic languages
    >
    >

    No, "They" is quite adequate because if "Scholars" is corrected in the
    previous sentence as you already requested there is no need for further
    correction here. Stop picking the same nit twice.

    >
    >
    >
    >>a Phoenician, palaeo-Hebrew etc inscription with square Hebrew glyphs is
    >>transliteration. Rather, for them it is a matter of replacing an
    >>obsolete or non-standard glyph by a modern standard glyph for the same
    >>character - just as one would not describe as transliteration
    >>
    >>
    > ^^^^^^^
    >Bogus analogy alert.
    >
    >

    I accept that this analogy depends on my understanding of Phoenician as
    a script variety, like Fraktur, rather than a separate script.

    >
    >
    >
    >>representation in Times New Roman of a Latin script text in mediaeval
    >>handwriting or in Fraktur.
    >>
    >>
    >
    >...
    >
    >The issue is simply the difficulty of coming to consensus for
    >certain archaic collections of writing systems, what constitutes
    >an encodable script boundary and what does not.
    >
    >And that, my friend, was obvious a *MONTH* ago in this
    >dicussion.
    >
    >

    Agreed. The problem comes from the continued inability of some to accept
    that the scholars of these scripts are in the best position to judge
    what constitutes a script boundary and what does not, or even to accept
    that their views are worthy of consideration, that there is some
    relevance to the fact that "long PRIOR to Unicode, Semitic scholars
    reached the conclusion all Semitic languages share the same 22 characters".

    >...
    >
    >>ut I accept that this Coptic to Greek compatibility has a few problems
    >>because not all characters have mappings. However, this is not a problem
    >>for Phoenician, because *every* Phoenician character has an unambiguous
    >>compatibility mapping to an existing Hebrew character.
    >>
    >>
    >
    >No Phoenician encoded character has a *compatibility* mapping
    >to an existing Hebrew character until the UTC says that it
    >does.
    >
    >

    Well, I'll drop Coptic, and agree that my terminology was lacking at
    this point.

    >The mappings are unambiguous and obvious, I grant you.
    >
    >But the chance that those mappings will be instantiated as compability
    >decomposition mappings in the standard is zero.
    >
    >
    >
    >>>I don't like the notion of interleaving in the default weighting
    >>>table, and have spoken against it, but as John Cowan has pointed
    >>>out, it is at least feasible. It doesn't have the ridiculousness
    >>>factor of the compatibility decomposition approach.
    >>>
    >>>
    >
    >
    >
    >>If what I have suggested is ridiculous, so is what the UTC has already
    >>
    >>
    > ^^^^^
    >Bogus analogy alert.
    >
    >
    >
    >>defined for Mathematical Alphanumeric Symbols.
    >>
    >>

    The analogy is only bogus if you presuppose that Phoenician is a
    separate script.

    >>
    >>
    >>>...
    >>>The equivalencing of 22 Phoenician letters, one-to-one against
    >>>Hebrew characters, where the mapping is completely known and
    >>>uncontroversial, is a minor molehill.
    >>>
    >>>
    >>Well, why not make these uncontroversial equivalents, between variant
    >>glyphs for the same script, compatibility decompositions?
    >>
    >>
    >
    >Well, if you don't understand the technical issues, Peter, how
    >about this for a reason why not:
    >
    >Because even if you came personally to the UTC and argued the
    >case for your position at the meeting, I predict that your
    >proposal would be turned down by a 0 For, 12 Against vote.
    >
    >Of course, if you also brought along the Patron Saint of
    >Lost Causes to help you, the vote might turn out 1 For, 11 Against.
    >
    >

    Well, I still have some hope that the UTC might base their decisions on
    the theoretical character-glyph model and on the long-standing judgment
    of scholars of Semitic writing, rather than on the views of generalists
    and Indo-Europeanists supported by some who do not understand the
    character-glyph model. But I accept your argument that compatibility
    equivalence is not the best way to go.

    >Now I've really had my fill of rehashing and regurgitation on
    >Phoenician. If anybody wants to have any *actual* impact on
    >the encoding decisions, I would suggest they finish writing
    >up and submitting formal documents for the UTC discussion,
    >instead of spending the weekend boring the rest of this list
    >with a further commodious vicus of recirculation....
    >
    >

    I was in the middle of preparing a formal submission when I realised
    that my arguments were tending towards compatibility equivalence and so
    felt the need to explore this avenue in more detail. I now accept that
    this is in fact a dead end.

    On 04/06/2004 21:50, Simon Montagu wrote:

    > Peter Kirk wrote:
    >
    >
    >> But I accept that this Coptic to Greek compatibility has a few
    >> problems because not all characters have mappings. However, this is
    >> not a problem for Phoenician, because *every* Phoenician character
    >> has an unambiguous compatibility mapping to an existing Hebrew
    >> character.
    >
    >
    > As I've said before, final forms in Hebrew make this not 100% true,
    > and I have seen both mappings in use in practice. For example
    > http://he.wikipedia.org/wiki/%D7%9E%D7%A6%D7%91%D7%AA_%D7%9E%D7%99%D7%A9%D7%A2
    > shows the text of the Mesha stele beginning
    > "אנכ. משע. בנ. כמש.. . מלכ. מאב", and I have a book (2 Kings in the
    > "Olam Hatanach" series) which shows it beginning
    > "אנך. משע. בן. כמש[ית] . מלך. מאב"
    >
    >
    Understood, and thank you for the point. I might argue that this
    ambiguity is not a real one but has been introduced by Unicode because
    of the choise it made to encode separately Hebrew final forms, but not
    Arabic ones. Separate encoding of Hebrew final forms is also a violation
    of the character-glyph model as these are variant glyphs for the same
    abstract character. But I accept that in modern Hebrew and Yiddish final
    forms have taken on some life of their own, justifying their separate
    encoding - although I might want to argue that they should have been
    defined as compatibility equivalents of non-final forms.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Sat Jun 05 2004 - 07:10:15 CDT