Re: What is the principle?

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Mar 26 2004 - 15:10:28 EST

  • Next message: Asmus Freytag: "Re: What is the principle?"

    > > (D) None of the above
    >
    > True.

    I would like to add to Jim Allan's excellent explanation
    here that the relevant coding domain for these decisions
    of same or different for encoding a particular character
    is the *script* in question.

    The first decision that needs to be taken is whether a
    particular writing system mooted for consideration for
    encoding of its characters constitutes a distinct script
    or not in the sense used by Unicode/10646.

    If the answer is no, and the writing system is considered,
    for example, to be a stylistic or historic variant of a
    script already encoded, then considerations of unification
    *within* a script come into effect. If it turns out that
    the particular writing system in question contains characters
    beyond whatever have already been encoded for that script,
    then those characters become valid candidates for additional
    encoding. Recent examples can be found among the various
    Arabic character additions for West African languages written
    in the Arabic script.

    If the answer is yes, then an entirely separate script will
    be encoded. This script must then have *all* of its
    characters encoded, even where there might be
    considerable overlap in appearance and/or linguistic function
    for some subset of those characters, for either historic
    reasons or merely by coincidence. An example of this can
    be see in Old Italic, many of whose letterforms are clearly
    related to early Greek and to early Latin. Nevertheless, once
    Old Italic was distinguished as a script to be encoded, rather
    than just another variant alphabet (or set of alphabets, actually)
    of archaic Greek, then that determines the further decisions
    about the repertoire to be encoded. It doesn't make any
    sense to just pick out those *particular* Old Italic letters
    that happen to be distinguishable in shape (U+10307 OLD ITALIC
    LETTER HE) or in function (U+1030E OLD ITALIC LETTER ESH)
    from Greek letters and to encode only them.

    Where people seem to get most hung up the first time they
    encounter UTC decisions about encoding characters (particularly
    for scripts, as opposed to symbol sets) is on these lookalike
    and/or historical relation questions. Hence the eternal newbie
    questions about Latin, Greek, and Cyrillic capital "A", for
    example.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Mar 26 2004 - 15:50:42 EST