Naming of functional ASCII characters in Unicode

From: Bernd Warken (bwarken@mayn.de)
Date: Mon Jun 05 2000 - 12:57:06 EDT


The Unicode ASCII range U+00-7F still shows elements of the out-dated
glyph approach instead of the intended character abstraction. This mail
tries to point out some places where Unicode uses a text-oriented
naming, tho a functionally oriented naming would be more suitable.

Historically, the 7-bits ASCII characters were used for databases and
programming languages. In later years, text processing required better
representations for some of these functional characters. This led to
extensions like the well-known code-pages, ISO character standards, and
Unicode.

So the primary task of the ASCII-7 code is programming, not text
processing. This makes the ASCII characters primarily functional.
Unicode usually honors this fact by providing alternative characters in
order to get more beautiful, printable look-alikes.

Unfortunately, some names (and glyphs) do not reflect this functional
meaning.

This might not seem a big problem today, but there are some long-term
considerations in some interpreter languages to include wide characters
in writing program code. At this point, the difference between
functional characters and printable representations will become crucial.

" U+0022 QUOTATION MARK

The character " is a double quote - just look at it. Most classical
programming languages use it to denote strings. In programming
documentation, it is referred to as the DOUBLE QUOTE character, whereas
`quotation' refers to text processing.

The Unicode name QUOTATION MARK for this character is ambiguous.
Quotation marks greatly vary for different languages, e.g., English,
German, and French differ by far. Unicode provides all characters
needed for these quotation concepts. As the the specified characters
have codes outside the ASCII range the name for U+0022 may be changed
without loss of generality.

Renaming it to DOUBLE QUOTE, would uniquely characterize the character,
mark it as a functional character, and relate it to the other ASCII
quotes. So it really could be renamed to DOUBLE QUOTE, maybe with the
additional name NEUTRAL QUOTATION NAME.

' U+0027 APOSTROPHE

The Unicode documentation for this character says `preferred character
for apostrophe is 2019'. So the main name APOSTROPHE is even documented
as being wrong.

It should be renamed according to its function, i.e., SINGLE QUOTE or
RIGHT QUOTE. These names were used for decades, before Unicode changed
it. The 3rd name `APL quote' looks like a possible candidate, but I
never found this name in a software documentation.

Again APOSTROPHE is text-oriented, while * QUOTE is functional.

A second problem with this character is its construction as `a neutral
(vertical) glyph having mixed usage'. ASCII-7 based terminals (the
basis programming environment) usually display the single quote as a
raised 9 quote or with a right slope, making it into a `right quote' and
not a `mixed vertical quote'. Moreover, its inclination helps to
distinguish it from its antipode ` (U+0060).

Maybe some new character VERTICAL APOSTROPHE should be defined for the
actual concept.

` U+0060 GRAVE ACCENT

Again the naming indicates a text-oriented approach. Like before, this
character is basically functional, e.g., in POSIX shell programming.

The pointers in the Unicode documentation show that for each of the
different textual usages an alternative look-alike character was
specified. So only its functional meaning is left and should be
reflected in its naming.

Traditionally, this character was called `back-quote' or `left quote' to
correlate it with ' (U+0027). So it should be renamed to BACK-QUOTE.
Depending on how the glyph is constructed, its additional name should be
LEFT QUOTE or STAND-ALONE GRAVE ACCENT.

^ U+0059 CIRCUMFLEX ACCENT

This character has always been known as the `caret' in POSIX wildcards
and regular expressions. So it should be renamed to CARET, possibly
with the additional name STAND-ALONE CIRCUMFLEX ACCENT. The IPA
character U+028C LATIN SMALL LETTER TURNED V shouldn't interfere.

- U+002D HYPHEN-MINUS

The name HYPHEN-MINUS is not suitable, for there is already a printable
hyphen, a printable minus sign, and several dashes.

Again, this character is a functional character for programming
languages (MINUS OPERATOR) and system administration (OPTION CHARACTER);
it should reflect these names.

The best name would be MINUS SIGN. Unfortunately, U+2212 is already
called like that. But this character could be renamed to what it is, a
PRINTABLE MINUS SIGN.

PRINTABLE

To avoid double names for characters, a prefix like PRINTABLE could be
used for look-alikes of other functional characters, esp. in additional
names.

Copyleft 2000 by Bernd Warken <bwarken@mayn.de>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT