RE: Unicode in source code. WHY?

From: Jonathan Rosenne (rosenne@qsm.co.il)
Date: Wed Jul 21 1999 - 02:55:05 EDT


We know that most current programming environments do not support Unicode,
many are restricted to US-ASCII and don't even allow comments in codes
128-255.

We also know that those environments that do allow the use of Unicode are
not all compatible. I see two main problems:

1. Should the full Unicode repertoire be allowed, or just a subset?

2. When are two identifiers to be considered equivalent?

I suggest the Unicode consortium should address these issues and any others
in a TR.

Jony

> -----Original Message-----
> From: Sandra O'donnell USG [mailto:odonnell@zk3.dec.com]
> Sent: Tuesday, July 20, 1999 9:22 PM
> To: Unicode List
> Cc: mohrin@sharmahd.com; odonnell@zk3.dec.com
> Subject: Re: Unicode in source code. WHY?
>
>
> Instead of debating whether it's right or wrong to use ASCII only
> in source code, or to use the full palate of Unicode characters,
> it's probably more useful to consider the advantages and disadvantages
> of both practices.
>
> With an ASCII-only restriction, you can be fairly confident that
> most existing software will be able to process the source code.
> Also, given that English-based source code currently is more prevalant
> than that of other natural languages, the chances are pretty good
> that programmers in other countries will be able to make some sense
> of the code, if not understand it completely. The disadvantage of
> ASCII-only software, of course, is that it is impossible to use
> the characters and words that are most meaningful for those who
> speak languages other than English.
>
> If the full Unicode repertoire is permitted in identifiers and
> elsewhere in the source code, that makes it easy for each nation's
> programmers to write code that is meaningful to them. This may
> improve maintenance costs. For example, if source code written and
> maintained in Japan uses Japanese identifiers, strings, and comments,
> the Japanese maintainers may have an easier time with the code
> than if it is restricted to ASCII only. After all, American engineers
> use identifier names like "month," "employee_name," "count", etc.,
> because they give some information about the identifier's purpose.
> Engineers in other countries would like the same ability to use
> meaningful names.
>
> The disadvantage of using more-than-ASCII in source code is that
> not all existing software can process it, so there may be a portability
> issue. Also, if the software ends up being maintained or enhanced by
> people who speak different languages than those who wrote it, the
> maintenance/enhancement budgets may go up. When I was at the Open
> Software Foundation, we used software from Siemens-Nixdorf in part
> of the Distributed Computing Environment (DCE). Not surprisingly,
> this software had comments (and messages?) in German, and our
> engineers spent a LOT of time working with the original code
> developers to translate the German information. Of course, most of
> our American engineers couldn't understand the German (you know how
> bad Americans are with other languages :-) ), but we also found that
> engineers from other countries who could manage with English in
> source code couldn't handle the German.
>
>
> When deciding between the advantages and disadvantages of ASCII or
> more-than-ASCII in source code, each company has to assess its
> individual situation. Portability may be an important attribute
> for one company, in which case it may restrict itself to ASCII.
> Portability may not be important at all to another, so it may choose
> Unicode. And so on, down the list of pros/cons.
>
> -- Sandra
> -----------------------
> Sandra Martin O'Donnell
> Compaq Computer Corporation
> sandra.odonnell@compaq.com
> odonnell@zk3.dec.com
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT