Re: Unicode in Source Code (Ada95 and Java)

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Tue Jul 20 1999 - 05:04:21 EDT


Murray Sargent wrote on 1999-07-20 01:16 UTC:
> An example where nonASCII identifiers is really useful is in coding up
> mathematical formulae that contain Greek letters. For example, a program is
> much more readable if you use U+3B1 for alpha rather than spelling out the
> name alpha. Similarly U+3C0 for pi. Hopefully C++ will follow Java's
> excellent example and allow Unicode alphabetics in variable names.

Ada95 is even younger than Java and it is the first ISO standardized
programming language that was designed after the publication of ISO
10646-1. Of course, Ada95 - like Java - also uses UCS as its internal
character set. However, the Ada95 revision team has explicitly decided
not to follow the path of Java and they only allowed the Latin-1 letters
in identifiers. The Ada community is very concerned about safety issues
and about the readability of source code, because Ada is widely deployed
today in safety critical environments (most avionics software is written
in Ada for instance). Unicode contains a quite large number of
characters that are difficult - if not impossible - to distinguish
visually. A safety requirement for Ada identifiers is that it must be
easy for human readers to decide whether two identifiers are different
or equal. The presence of Unicode characters such as U+00D0, U+0110 and
U+0189 introduces a lot of potential hazards that are best avoided by
not allowing a too rich repertoires of characters in object identifiers.
Note however that the Ada95 standard does allow implementations to offer
"non-standard" optional modes that do allow additional UCS characters in
identifiers.

Have a look at:

  Ada95 Reference Manual, ISO/IEC 8652:1995(E), Section 2.1: Character Set,
  http://wuarchive.wustl.edu/languages/ada/userdocs/docadalt/rm95/02.htm

  http://www.cl.cam.ac.uk/~mgk25/ada.html

Markus
(who decided to use Ada95 for his PhD implementation project, because
the language is at least as nice and modern as Java, but its compilers
produce far more efficient native machine code.)

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT