Re: Unicode in source code. WHY?

From: Sandra O'donnell USG (odonnell@zk3.dec.com)
Date: Tue Jul 20 1999 - 15:27:59 EDT


Instead of debating whether it's right or wrong to use ASCII only
in source code, or to use the full palate of Unicode characters,
it's probably more useful to consider the advantages and disadvantages
of both practices.

With an ASCII-only restriction, you can be fairly confident that
most existing software will be able to process the source code.
Also, given that English-based source code currently is more prevalant
than that of other natural languages, the chances are pretty good
that programmers in other countries will be able to make some sense
of the code, if not understand it completely. The disadvantage of
ASCII-only software, of course, is that it is impossible to use
the characters and words that are most meaningful for those who
speak languages other than English.

If the full Unicode repertoire is permitted in identifiers and
elsewhere in the source code, that makes it easy for each nation's
programmers to write code that is meaningful to them. This may
improve maintenance costs. For example, if source code written and
maintained in Japan uses Japanese identifiers, strings, and comments,
the Japanese maintainers may have an easier time with the code
than if it is restricted to ASCII only. After all, American engineers
use identifier names like "month," "employee_name," "count", etc.,
because they give some information about the identifier's purpose.
Engineers in other countries would like the same ability to use
meaningful names.

The disadvantage of using more-than-ASCII in source code is that
not all existing software can process it, so there may be a portability
issue. Also, if the software ends up being maintained or enhanced by
people who speak different languages than those who wrote it, the
maintenance/enhancement budgets may go up. When I was at the Open
Software Foundation, we used software from Siemens-Nixdorf in part
of the Distributed Computing Environment (DCE). Not surprisingly,
this software had comments (and messages?) in German, and our
engineers spent a LOT of time working with the original code
developers to translate the German information. Of course, most of
our American engineers couldn't understand the German (you know how
bad Americans are with other languages :-) ), but we also found that
engineers from other countries who could manage with English in
source code couldn't handle the German.

When deciding between the advantages and disadvantages of ASCII or
more-than-ASCII in source code, each company has to assess its
individual situation. Portability may be an important attribute
for one company, in which case it may restrict itself to ASCII.
Portability may not be important at all to another, so it may choose
Unicode. And so on, down the list of pros/cons.

                -- Sandra
-----------------------
Sandra Martin O'Donnell
Compaq Computer Corporation
sandra.odonnell@compaq.com
odonnell@zk3.dec.com



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT