Re: Response to Everson Phoenician and why June 7?

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu May 20 2004 - 18:51:03 CDT

  • Next message: Michael Everson: "ISO 15924"

    Patrick said:

    > >In this case, I think it's important to be picky because there are
    > >no current Unicoding practices for Phoenician.
    > >
    > You may mean that the Unicode book does not document how Phoenician (or
    > Paleo-Hebrew) may be encoded. This is not to say that no one is using
    > Unicode to encode Paleo-Hebrew texts.
                 ^^^^^^
                 represent
                 
    I like to distinguish this, because the whole notion of
    what it means to "encode a text" tends to derail the discussion
    immediately.

    The Unicode Standard *encodes* abstract characters.

    There are many potential abstract characters, but one of the
    general principles used is that each significant "letter" (grapheme)
    from a *script* will be encoded once as a character in the
    standard. That, of course, begs the question of identifying
    the "script" and its exact repertoire of "letters". The identification
    of the "script" is what the Phoenician argument has been about,
    since there is no serious question about the repertoire of
    "letters" for it.

    Once a repertoire of abstract characters has been *encoded*
    in the Unicode Standard, those encoded characters can then
    be used to *represent* the plain text content of documents.

    This is deliberately different from talking about "encoding the
    text", because people don't have common understandings about
    what that means, and often expect various aspects of format
    and appearance to also be "encoded" -- hence the way these
    discussions tend to veer off into ditches.

    Now returning to Patrick's statement and substituting for a
    different unencoded script:

    > the Unicode standard does not document how *Avestan*
    > may be encoded. This is not to say that no one is using
    > Unicode to represent *Avestan* texts.

    Also true, right? Or...

    > the Unicode standard does not document how *Tifinagh*
    > may be encoded. This is not to say that no one is using
    > Unicode to represent *Tifinagh* texts.

    O.k., I guess you can see that this particular argument is not
    going to go anywhere. Any script which is not currently encoded
    in the standard can be (and probably is) represented *somehow*
    by Unicode characters, either via PUA or transliteration or
    some other arbitrary intermediate encoding of entities. That it
    is (or could be) so represented has little or no bearing on the
    question of whether the script in question is or is not
    distinct enough from some already encoded but historically
    related script to warrant a distinct encoding as a "script" in
    the Unicode sense of a script.

    John Hudson asked, again:

    > My question, again, is whether there is a need for the plain
    > text distinction in the first place?

    And I claim that there is no final answer for this question. We
    simply have irresolvable differences of opinion, with some
    asserting that it is self-evident that there is such a need,
    and others asserting that it is ridiculous to even consider
    encoding Phoenician as a distinct script, and that there is
    no such need.

    My own take on this seemingly irreconcilable clash of opinion is
    that if *some* people assert a need (and if they seem to be
    reasonable people instead of crackpots with no demonstrable
    knowledge of the standard and of plain text) then there *is*
    a need. And that people who assert that there is *no* need
    are really asserting that *they* have no need and are making
    the reasonable (but fallacious) assumption that since they
    are rational and knowledgable, the fact that *they* have no
    need demonstrates that there *is* no need.

    If such is the case, then there *is* a need -- the question
    then just devolves to whether the need is significant enough
    for the UTC and WG2 to bother with it, and whether even if
    the need is met by encoding of characters, anyone will actually
    implement any relevant behavior in software or design fonts
    for it.

    In my opinion, Phoenician as a script has passed a
    reasonable need test, and has also passed a significant-enough-
    to-bother test.

    Note that these considerations need to be matters of
    reasonableness and appropriateness. There is no absolutely
    correct answer to be sought here. A character encoding standard
    is an engineering construct, not a revelation of truth, and
    we are seeking solutions that will enable software handling
    text content and display to do reasonable things with it at
    reasonable costs.

    If you start looking for absolutes here, it is relatively easy
    to apply reductio ad absurdum. In an absolute sense, there is
    no "need" to encode *any* other script, because they can *all*
    be represented by one or another transliteration scheme or
    masquerading scheme and be rendered with some variety or
    other of symbol font encoding. After all, that's exactly what
    people have been doing to date already for them -- or they
    are making use of encodings outside the context of Unicode,
    which they could go on using, or they are making use of graphics
    and facsimiles, and so on. The world wouldn't end if all such
    methods and "hacks" continued in use.

    The question is rather, given the fundamental nature of the
    Unicode Standard as enabling text processing for modern
    software, it is cost-effective and *reasonable* to provide
    a Unicode encoding for one particular script or another,
    unencoded to date, so as to maximize the chances that it
    will be handled more easily by modern software in the global
    infrastructure and to minimize the costs associated with
    doing so.

    *That* is the test which should be applied when trying to
    make decisions about which of the remaining varieties of
    unencoded writing systems rise to the level of distinctness,
    utility, and cost-effectiveness to be encoded as another
    script in the standard.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu May 20 2004 - 18:52:07 CDT