Re: Script variants and compatibility equivalence, was: Response to Everson Phoenician and why June 7?

From: Peter Kirk (
Date: Sat Jun 05 2004 - 12:17:41 CDT

  • Next message: E. Keown: "Archaic Greek letter like palm tree?"

    On 05/06/2004 08:25, John Hudson wrote:

    > Peter Kirk wrote:
    >>> All Hudson is pointing out is that long PRIOR to Unicode, Semitic
    >>> scholars reached the conclusion all Semitic languages share the same
    >>> 22 characters. A long standing and quite useful conclusion that has
    >>> nothing at all to do with your proposal.
    >> But I dispute his last sentence. If the writing systems of these
    >> languages share the same abstract characters, they form a single
    >> script, which conflicts with the proposal to encode Phoenician as a
    >> separate script.
    > Did you read, also, my messages regarding the perception of instances
    > of a script continuum? Restating your perception that the instances of
    > Phoenician and Hebrew represent the same 'script' for Unicode purposes
    > is just reverting to the fundamental disagreement with those who have
    > stated a desire or need to distinguish such instances in plain text.
    > 'Script' in Unicode is a generic term that does not necessarily relate
    > to notions of script outside Unicode. The determining feature of a
    > Unicode script, i.e. a labelled subset of characters, is that it is
    > something that can be differentiated from other subsets of characters
    > *in plain text*. Whether things so-differentiated are considered
    > individual scripts outside of Unicode isn't very relevant to this
    > usage. Indeed, Unicode might have avoided all this debate by not using
    > the term script at all.

    Well, I tend to agree that the word "script" has not helped. It doesn't
    help that the definition you use here conflicts with the one Michael
    Everson uses when he insists that Phoenician is a separate script. On
    your definition it is clearly not one until the UTC defines that it is.
    So we end up with a circular argument.

    On your definition, the set of fullwidth forms FF01-FF5E is a separate
    script, because it is a labelled subset of characters which can be
    differentiated from any other such set in plain text. So are each of the
    subsets of mathematical alphanumeric symbols. But they have
    compatibility decompositions to regular Latin script. If these are
    separate scripts, I might accept that Phoenician should also be one. But
    Ken Whistler disagrees: he wrote yesterday "These are not separate scripts."

    So let's drop "script" for now. My basic contention is that each letter
    of the Phoenician abjad is not a separate abstract character, but that
    it and the corresponding square Hebrew letter are glyph variants of the
    same abstract characters. And this is clearly the understanding of
    Semitic scholars, as summarised by Patrick Durusau and quoted above. On
    the other hand, nearly everyone agrees that there should be a mechanism
    for distinguishing them in plain text.

    Is this a novel situation? No, for Unicode has clearly recognised this
    kind of situation in TUS section 15.6 which I quoted earlier. And
    Unicode has defined a mechanism for dealing with the situation,
    variation selectors. If this mechanism is not appropriate in this
    particular case, let the UTC come up with another mechanism to meet the
    user requirement. To define a new set of abstract characters for what
    are actually glyph variants is to ignore the character-glyph model.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Sat Jun 05 2004 - 12:19:00 CDT