Re: Unicode in source code. WHY?

From: Paul Keinanen (keinanen@sci.fi)
Date: Tue Jul 20 1999 - 14:10:15 EDT


At 06:54 20.7.1999 -0700, Torsten Mohrin wrote:
>
>Can someone give me at least one really good reason, why I should use
>Unicode in identifiers in programming languages? What's wrong with
>English and ASCII (and I mean "ASCII") and [A-Za-z_] ?

Here in Finland many application programmers have used Finnish identifier
names for decades (although system programmers usually use English
identifier names). If the design is in Finnish which refer to some existing
local entities, it is much easier to use the the original name for an entity
than try to translate the name to English (and two programmers would come up
with different translations :-).

The main problem is that the characters Ä (A with diaeresis) and Ö (O with
diaeresis) had to be written as A resp. O. With a limited vocabulary (set of
identifiers) this does not cause very much ambiguity with limited size
compilation units. Unfortunately, programmers used to use this fallback also
when writing plain text (e.g. e-mail), since with the practically unlimited
vocabulary of the plain language, there are more ambiguous situations.

My quess is that when Latin-1 or Unicode will be allowed in identifier
names, Ä and Ö will be used extensively in identifier names in many
application programs.

Some programming languages (notably C and C++) have used case sensitive
identifier names (thus "number", "Number" and "NUMBER" are three separate
identifiers) to generate a large number of identifiers with only a few
characters. Trying to help people over the phone, when a case sensitive
programming language or case sensitive command language (shell) is used is a
nightmare :-).

With programming languages that support Unicode in identifiers, it is e.g.
possible to write application specific code e.g. for Greek or Russian
applications and use Greek or Cyrillic characters for application specific
entities and Latin identifiers for system call parameters. Thus, there would
be no identifier name ambiguity problems between application and system
identifiers and thus no need for dubious case sensitive identifiers :-).

Paul Keinänen



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT