RE: Clones (was RE: Hexadecimal)

From: Asmus Freytag (
Date: Tue Aug 19 2003 - 08:57:25 EDT

  • Next message: John Cowan: "Re: [Way OT] Beer measurements (was: Re: Handwritten EURO sign)"

    Compatibility characters:

    The recommendations for compatibility characters are necessarily vague,
    since their use in legacy data (and legacy environments) is strongly
    dependent on what is (or was) customary in a given environment.

    If a process merely warehouses text data (or parses only a very small
    subset of characters for special purpose, such as an HTML parser) then
    merely preserving legacy characters is often the best strategy. However,
    take the opposite example, of a process that actually scans the text for
    roman numerals. In that case, ignoring the compatibility characters would
    be a mistake, since legacy data of the kind for which these compatibility
    characters were added would *only* contain roman numerals in this form.
    They would *not* use the ASCII characters.

    Processes that modify legacy data for re-export to a legacy system
    obviously need to be intimately familiar with the legacy conventions, in a
    way that could not possibly be documented in the Unicode Standard in all
    details for every character and every legacy system.

    Documentation in the code charts:

    I agree with several of the comments that "hiding" the information about
    special characters in running text makes it unnecessarily difficult to work
    with the information. On the other hand, not everything can be succinctly
    expressed in machine readable tables (some characters have complicated
    usages), and even annotations in the name list have limits. They are
    definitely not the place for lengthier discussions.

    For Unicode 4.0 we have attempted to improve the situation by systematically
    extracting the line-breaking related information into UAX#14, which at
    least allows task-focused access. Information about mathematical usage of
    characters is now collected in one place in UTR#25, partially duplicating,
    and partially extending the information in the text of the standard, but
    providing a single place of access. Further improvements are possible.
    Personally I'd be in favor of some icon in the character names list that
    simply indicates that a character is more fully discussed elsewhere - that
    would make the code charts more useful as an index into the description of
    the characters.

    Mathematical operators:

    Future extensions of programming languages should allow not only the MINUS
    sign as operator, but many other charactesr, for example LOGICAL AND and
    LOGICAL OR, and as many other operators as appropriate for the language.

    Input of the operators doesn't have to necessarily be done via a special
    purpose keyboard. The use of input macros, editor substitution or similar
    input technologies (e.g. turning && into LOGICAL AND) would be more
    flexible. Some editors already support the display of highly formatted
    program source code even though the underlying text backbone uses the
    standard ASCII conventions of current programming languages. Just one
    example is Source Insight from, which not only
    represents >= etc. by singly symbols, but can also correctly increase the
    size of outer parentheses for nested expressions.


    This archive was generated by hypermail 2.1.5 : Tue Aug 19 2003 - 09:34:07 EDT