Definitio "Sn ofcript" etc. (was: Re: Phoenician & Kharoṣṭhī proposals)

From: Christopher Fynn (
Date: Sun May 30 2004 - 13:27:07 CDT

  • Next message: Christopher Fynn: "Definition of Script etc."

    John Hudson wrote:


    > I have been thinking today that part of the reason for the debate is
    > that Unicode has a singular concept of 'script', a bucket into which
    > variously shaped concepts of writing systems must be put or rejected.
    > I don't think there is anything conceptually wrong with the idea that
    > specific instances of a single script might be separately encoded if
    > there is a need or desire to distinguish them in plain text. It just
    > happens that Unicode has only one word that can be applied to such
    > instances, and that is 'script'. It seems clear to me now that what
    > Unicode calls a script needn't necessarily be what semiticists, or
    > anyone else, calls a script. A functional Unicode definition of script
    > might be formed as: a finite collection of characters that can be
    > distinguished in plain text from other collections of characters.



    "Script" is already defined in ISO 10646 as:

    <<4.35 script: A set of graphic characters used for the written form of
    one or more languages.>>

     and "graphic character" is defined as :

    << 4.20 graphic character: A character, other than a control function,
    that has a visual representation normally handwritten, printed, or

    So I guess if any further definition of "script" is necessary it should
    be based on this.

    Further the (draft?) ISO 15924 standard uses the same definition

    << 3.7 script A set of graphic characters used for the written
    form of one or more languages.(ISO/IEC 10646-
    1)(fr 3.6 écriture )>>

    but adds an extra note:

    << NOTE 1:A script,as opposed to an arbitrary subset of
    characters,is defined in distinction to other scripts;in
    general,readers of one script may be unable to read the
    glyphs of another script easily,even where there is a
    historic relation between them (see 3.9).>>

    [ 3.9 script variant
    A particular form of one script which is so
    distinctive a rendering as to almost be considered
    a unique script in itself.(fr 3.9 variante d ’écriture )]

    With regard to historic & archaic scripts TUS itself states
    "The overall capacity for more than a million characters is more than
    sufficient for all known character encoding requirements, including full
    coverage of all minority and historic scripts of the world. " (1.0 )


    "As the universal character encoding scheme, the Unicode Standard must
    also respond to scholarly needs. To preserve world cultural heritage,
    important archaic scripts are encoded as proposals are developed." (1.1.2)

    So there is a clear statement of purpose to give full coverage to *all*
    minority and historic scripts and to encode "important" archaic scripts.

    In 1.2 "Design Goals" TUS states:
    "The primary goal of the development effort for the Unicode Standard was
    to remedy two serious problems common to most multilingual computer
    programs. The first problem was the overloading of the font mechanism
    when encoding characters."

    Telling people who propose a script that they can "just use a
    different font " could very easily contradict this stated goal.

    > There are very real issues of software implementation, font
    > development, collation, text indexing and searching, etc. that arise
    > from encoding multiple instances of what some users consider a single
    > script, whether users in general opt to make the distinction in plain
    > text or not, by using the separate character collections or unifying
    > text in a single character collection and making the distinction at a
    > higher level. I'm beginning to think that our time would be better
    > spent thinking about those issues.
    These are of course real issues - particularly collation, text
    indexing, searching and - where a written language occurs in several
    scripts - the ability to display text encoded in one script with glyphs
    of another. Establishing standard, straightforward and widely supported
    means to deal with these issues is a worthy goal. In many cases the
    solutions for these problems is in fact already specified or pretty
    clear - and, relatively speaking , these are reasonably straightforward
    to implement.

    Thier absecence - or lack of support - should not be a reason to
    reject a script proposal on the grounds that "it will cause
    difficulties" - this is sort of kind of argument put forward by PR China
    when they submitted their proposal for a host of precomposed Tibetan
    characters. When Indic scripts were first encoded a whole software
    infrastructure and font/rendering technologies which were not then
    available in common desktop operating systems was assumed - and it has
    taken a decade for this encoding to be anything like widely supported on
    a practical level.
    The solutions for these problems already specified or pretty clear -
    and, relatively speaking , reasonably straightforward to implement.

    IMO, in the long term, encoding of archaic scripts is going to benefit
    the whole scholarly community. When children discover all kinds of
    scripts on their computers they are going to become curious and play
    with them and some of them will be inspired to go out and find out more
    about these scripts. Some of these will develop a serious interest and a
    few will end up being the Palaeographers, Semiticists, Sanskritists and
    so on of tomorrow.

    - Chris

    This archive was generated by hypermail 2.1.5 : Sun May 30 2004 - 13:31:22 CDT