Re: Fachwörterliste

From: John Hudson (john@tiro.ca)
Date: Fri Mar 03 2006 - 12:48:30 CST

  • Next message: Tim Greenwood: "Re: (no subject)"

    E. Keown wrote:

    > Character set, a definition :
    > A character set is a computerized version
    > of any alphabet (or other writing system).
    >
    > Each letter, number, symbol, etc. of the
    > computerized alphabet is assigned a unique
    > number for the computer to use in software.

    This definition suggests, or presupposes, a direct correlation between the structure of an
    encoding and the structure of a writing system. The problem with this is that a) it is not
    always the case, b) the structural analysis of writing systems is a relatively new field,
    and c) there is sometimes disagreement about how to correctly describe the structure of a
    writing system (see, for instance, the discussions regarding Tamil on the Indic list). I
    am wary of a definition that would lead people to conclude that a character encoding must
    directly correlate to a particular understanding of the structure of a writing system. The
    goal of a character encoding is to be workable, i.e. to enable the encoding of text and
    the performance of typical text processing functions (searching, sorting, string
    comparisons, etc.). It is not the goal of a character encoding to provide a computerised
    model of how a writing system is thought of in the minds of the people who write it or
    study it.

    The Unicode glossary defines Character Set as

            A collection of elements used to represent textual
            information.

    which seems to me to be a good place to start. Notice that the definition references the
    *use* of the characters, rather than their identity as it relates to writing systems. The
    Unicode gloassary seems to me quite a good 'Fachwörterliste':

            http://www.unicode.org/glossary/

    If I wanted a more explanatory definition for 'non-geeks', I would try something like this:

            A character set is a collection of elements (letters,
            symbols, punctuation, numerals, etc.) needed to represent
            text on a computer. Each element in a character set is
            assigned a unique numeric identity, which is recognised
            by computer software employing the character set.
            Standardised character sets facilitate the interchange of
            text between computers, and enable computerised text
            processing operations such as searching, sorting, and
            comparing text. A particular character set may encode
            one or more writing systems.

    John Hudson

    -- 
    Tiro Typeworks        www.tiro.com
    Vancouver, BC         john@tiro.ca
    I am not yet so lost in lexicography, as to forget
    that words are the daughters of earth, and that things
    are the sons of heaven.  - Samuel Johnson
    


    This archive was generated by hypermail 2.1.5 : Fri Mar 03 2006 - 12:50:46 CST