Re: APL Under-bar Characters from Ken Whistler on 2015-08-18 (Unicode Mail List Archive)

From: Ken Whistler <kenwhistler_at_att.net>
Date: Tue, 18 Aug 2015 10:13:15 -0700

On 8/18/2015 9:45 AM, Doug Ewell wrote:
> Ken Whistler <kenwhistler at att dot net> wrote:
>
> Then we're back to the central point that Alex Weiner originally
> expressed, in arguing for the encoding of precomposed letters with
> underbar:
>> The string length functionality would view an 'A' code point combined
>> with an '_' code point as an item that has two elements, while
>> something that looks like 'A' Should be atomic, and return a length
>> of one.
>

Precisely.

And instead of pushing for the impossible, the correct solution here
involves dividing and conquering:

1. If the issue is just the *presentation* of legacy APL materials showing
the traditional IBM uppercase italic letters with underscores, then
fix some fonts, use the combining character sequences (or styling,
makes no matter), and edit away with existing characters, and with
no implications for APL implementations.

2. If the issue is *augmentation* of APL implementations to have an
additional A-Z set of character symbols, beyond the upper- and lowercase
ones apparently supported by most APL fonts and implementations,
then pick one of the existing, encoded, mathematical alphabets
and have done with it. There are 13 to choose from! The sans-serif
italic set might make a nice choice. And for the cherry on top, in
the APL fonts, draw a non-connecting underline beneath your
26 new letters to please traditionalists.

The reason to do #2 is that the implementations of APL, because of
the very nature of the language, need their "characters" to have
a fixed size, so that each element of a data array of "characters"
is exactly one "character".

The oopsie for #2, of course, is that if your APL implementation is
actually using 16-bit code *units* for your characters, it is still
stuck in a UCS-2 world, and can't handle UTF-16, because that
once again breaks the ironclad rule that 1 "character" equals
one data element in the array.

The fix for the oopsie is to upgrade the APL implementations to UTF-32.
At that point, the supplementary character problem goes away,
and APL could freely augment its sets of A-Z symbols with the
mathematical alphanumeric symbols without further ado.

What people should *not* be doing is insisting on being stuck
in 1970, as if everybody were still doing APL with IBM Selectric typewriter
terminals hooked up to IBM/360 mainframes using an EBCDIC
APL character set, and that everything in the APL program text
has to look precisely the way it did in 1970.

--Ken
Received on Tue Aug 18 2015 - 12:14:46 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 18 2015 - 12:14:47 CDT