Re: String name and Character Name

From: Hans Aberg (
Date: Fri Apr 22 2005 - 05:26:21 CST

  • Next message: Otto Stolz: "Re: String name and Character Name"

    At 08:38 +0100 2005/04/22, Arcane Jill wrote:
    >>I don't know why there is a need for a
    >>second "unique and immutable identifier" in addition to the U+xxxx code
    >>point identifier. But given that there is such a list, its highly
    >>restricted intended purpose should be made more clear. This must be done
    >>to reduce the problem of people, even major software companies which are
    >>Unicode consortium members, using the list in unintended ways as
    >>meaningful text.

    >Like some others here, I simply don't see the point of
    >human-readable machine-readable list. One or the other, yes, but not
    >both at the same time. There is absolutely no need for an immutable
    >machine-readable list to be human-readable /at the same time/.
    >U-[xx]xxxx works perfectly well as a unique machine-readable
    >identifier, /and/ would work perfectly well as a localization key.
    >(In fact, a database table which uses a numeric primary key is
    >likely to be more efficient than a database table that uses a string
    >primary key).

    Giving each abstract character a unique, human readable name, is in
    the first place useful to humans that want to use it to identify the
    characters. If one wants to say define a new character set, that
    eventually might get its own character numbering, then structurally,
    it would be better to use those names.

    Then, whether one would use those character names or the U-X..X
    character numbers, is just a question of what is useful
    implementationwise. If there is a list by which one can always
    translate back and forth between character names and character
    numbers, then, in an implementation, one can always use say the
    character numbers, and translate into character names, whenever a
    human would want to interpret it. But in a computer implementation,
    one should not assume that an efficient logical representation leads
    to an efficient computer implementation. For example, when
    implementing functional language, there is an efficient de Bruijn
    representation that does not need traditional lambda variable names,
    with some other pleasing logical properties. However, it is rarely
    used in actual functional language implementations, as debugging,
    which is carried out by humans, becomes very difficult.

    One can also note that the U-X..X numbers are there only because they
    are thought to be efficient with our current computer technologies.
    If one, in a computer, compares two strings, then that in effect
    amounts to comparing multiprecision numbers, which one may want to
    avoid in a time critical application. Second, one can note that if
    one were to represent a text using the character names explicitly,
    and the applies a common compression technique, then that
    compression, if properly done will create a character table which
    will be more efficient than the U-X..X representation. So with
    systematic use of suitable compression techniques, one may do away
    with both the character numbers U-X..X as well the various character
    encodings UTF-8/16/32.

    In the end, this discussion leads to one common about computer
    languages, the latter which usually all are Turing equivalent. If all
    these computer languages are Turing equivalent, and thus can process
    exactly the same algorithm, why not simply select one computer
    language, and do away with the all others? In reality, though
    different computer languages differ immensively in how different
    logical structures are efficiently implemented both to humans and in
    the computer. So one ends up with tradeoffs which will depend on the
    humans, implementations and the computer the software will run on.
    The same will apply with choices such as between using the U-X..X or
    the character name identification.

       Hans Aberg

    This archive was generated by hypermail 2.1.5 : Fri Apr 22 2005 - 05:28:17 CST