From: Asmus Freytag (email@example.com)
Date: Tue Aug 19 2003 - 08:57:25 EDT
The recommendations for compatibility characters are necessarily vague,
since their use in legacy data (and legacy environments) is strongly
dependent on what is (or was) customary in a given environment.
If a process merely warehouses text data (or parses only a very small
subset of characters for special purpose, such as an HTML parser) then
merely preserving legacy characters is often the best strategy. However,
take the opposite example, of a process that actually scans the text for
roman numerals. In that case, ignoring the compatibility characters would
be a mistake, since legacy data of the kind for which these compatibility
characters were added would *only* contain roman numerals in this form.
They would *not* use the ASCII characters.
Processes that modify legacy data for re-export to a legacy system
obviously need to be intimately familiar with the legacy conventions, in a
way that could not possibly be documented in the Unicode Standard in all
details for every character and every legacy system.
Documentation in the code charts:
I agree with several of the comments that "hiding" the information about
special characters in running text makes it unnecessarily difficult to work
with the information. On the other hand, not everything can be succinctly
expressed in machine readable tables (some characters have complicated
usages), and even annotations in the name list have limits. They are
definitely not the place for lengthier discussions.
For Unicode 4.0 we have attempted to improve the situation by systematically
extracting the line-breaking related information into UAX#14, which at
least allows task-focused access. Information about mathematical usage of
characters is now collected in one place in UTR#25, partially duplicating,
and partially extending the information in the text of the standard, but
providing a single place of access. Further improvements are possible.
Personally I'd be in favor of some icon in the character names list that
simply indicates that a character is more fully discussed elsewhere - that
would make the code charts more useful as an index into the description of
Future extensions of programming languages should allow not only the MINUS
sign as operator, but many other charactesr, for example LOGICAL AND and
LOGICAL OR, and as many other operators as appropriate for the language.
Input of the operators doesn't have to necessarily be done via a special
purpose keyboard. The use of input macros, editor substitution or similar
input technologies (e.g. turning && into LOGICAL AND) would be more
flexible. Some editors already support the display of highly formatted
program source code even though the underlying text backbone uses the
standard ASCII conventions of current programming languages. Just one
example is Source Insight from www.sourceinsight.com, which not only
represents >= etc. by singly symbols, but can also correctly increase the
size of outer parentheses for nested expressions.
This archive was generated by hypermail 2.1.5 : Tue Aug 19 2003 - 09:34:07 EDT