L2/98-307

L2/98-308
Report on offline meeting at the SC22 plenary
regarding TR 10176, Annex A
September 28, 1998

L2 members:
Keld Simonsen just circulated to WG20 the following report on the SC22
plenary held in Denmark last August. During that meeting there
apparently were some important side discussions between Keld
and the various experts on C, C++, Cobol, Fortran, and Lisp
about identifier syntax.
Please consider the discussion as reported below. It should
clarify the de jure importance of TR 10176 Annex A for
SC22-based language standards. This should make it clear why
I have been rabble-rousing about the mistakes in that Annex
and their relation to the identifier suggestions made by
Unicode and implemented in Java. Participating in sorting this
out should be a high priority for L2, in my opinion.

--Ken

----- Begin Included Message -----
>From SC22WG20-request@dkuug.dk Mon Sep 28 14:33:43 1998
Date: Mon, 28 Sep 1998 23:13:22 +0200 (CEST)
To: sc22wg20@dkuug.dk
Subject: (SC22WG20.2381) report on meeting in conjunction with the recent SC22 plenary
Title: Report from the SC22 plenary 1998
Date: 1998-09-18
Source: Keld Simonsen
Status: Expert contribution

At the SC22 plenary in Snekkersten, Denmark, August 1998, a number
of WG20 related issues were discussed in offline discussions.

This in addition to the issues already addressed and reported
in SC22 resolutions:

- change of title of 14651
- Assigning 15897 to WG20
- review by WG20 of Electronic Commerce

Issues that needs further working were:

- Unicode C liaison
- Revision of TR 10176 wrt. annex A

The offline discussions were attended by conveners from Cobol, C,
Fortran, C++, and Lisp, and 3 experts from WG20.

It was initiated by some questions
from Cobol, especially about 10176 annex a.
It was agreed that the list should be a positive list
of characters allowed in identifiers, and also that this
list should be stable over several years (say 3-5).
People were happy with what was in TR 10176 annex A; and C, C++ and
Cobol reported they were using this specification in their standards.
C++ had used an earlier specification of Annex A, and
it will be proposed that the list be amended in the newly
approved C++ standard. It was noted that Java currently uses
a similar, but different specification.

One problem noted was that TR 10176 annex A does not distinguish
between letters and digits, and most languages does not
allow a digit as the first character of an identifier.
It was noted that the i18n FDCC-set of the proposed
14652 has this information, in the alpha and digit classes.
There was no agreement whether to generally guide languages
to take the combined set of letter-like symbols and digits
as the first character in ids, or to only take the i18n alpha class.
The C WG advised using their model B on character identifiers,
that allowed non-UCS characters to be in identifiers.

The C WG advised that there may be problems at the linkage
level, between different implementations, and different
programming languages.

In sorting it was advised that not only 14651 specifications be
adhered to, but that also national sorting specifications be
honoured.

Keld Simonsen advised that to guarantee portablilty the compilation
of programs not be locale dependent, but that the i18n fdcc-set of
14652 be used for character set properties.

The newly adopted cultural registry standard 15897 was a natural
place to find national specifications.

All in all I found that there were considerable interest in these
cultural issues, and there were considerable interests from other
SC22 WGs in using WG20 specifications, so I advise more meetings with
the othe SC22 WGs in the future, possibly most conveniently in
connection with future SC22 plenaries.

Another conclusion for me is that there is a need for further guidance
to programming languages on these issuse, and I would ask WG20 to
consider providing such guidance, possibly in the form of a revision
of TR 10176 in this area.

----- End Included Message -----