RE: APL Under-bar Characters from alexweiner_at_alexweiner.com on 2015-08-16 (Unicode Mail List Archive)

From: <alexweiner_at_alexweiner.com>
Date: Sun, 16 Aug 2015 12:36:08 -0700

Hi Ken,

You are correct in observing that most APLers use upper and lower case. I know that one of the largest APL software firms in North America that uses a version of APL that contains under-bar characters (Juergen, I'll have an answer to the ⎕AV layout tomorrow) . It is a customized version that proves to be... troublesome when converting to ASCII and Unicode.

It seems there are some people in the (at least GNU) APL community that really consider under-bar letters as unique as accented characters.

I have seen a "workaround" that maps APL's under-bars to Unicode's letter-in-circle:

https://en.wikipedia.org/wiki/Enclosed_Alphanumerics

Maybe using these, and an application layer map is the solution to make APLers less grumpy :)

FWIW,

-Alex

-------- Original Message --------
Subject: Re: APL Under-bar Characters
From: Ken Whistler <kenwhistler@att.net>
Date: Sun, August 16, 2015 11:37 am
To: Khaled Hosny <khaledhosny@eglug.org>
Cc: alexweiner@alexweiner.com, unicode@unicode.org

It seems to me that APL has some very deeply embedded (and ancient)
assumptions about fixed-width 8-bit characters, dating from ASCII days.
It only got as far as it did with the current assumptions because people
hacked up 8-bit fonts for all the special characters for the APL syntax,
and because IBM implemented those as dedicated special character sets with
matching specialized APL keyboards.

A built-in function like ⍴ which returns the *size* of data is structurally
hand-in-hand with the definition of vectors and arrays. There seem to
be very deep assumptions in the APL data model that strings are simply
an array of *fixed-size* data elements, aka "characters".

So requiring ⍴,'ä' and ⍴,'A' to "just work" is the moral equivalent of asking the
C library call strlen("ä") or strlen("A") to "just work", regardless of the
representation of the data in the string. It is a nonsensical requirement
if applied to general Unicode strings outside the context of a very
carefully restricted subset designed to ensure one-to-one relationship
between "character" and "array element".

A Unicode-based APL implementation can (presumably) just up the size
of its "character" to 16-bits internally (actually a UTF-16 code *unit*)
and carefully restrict itself to the subset of ASCII & Latin-1, the APL
symbols and a few other operators needed to fill out the set.

Looking at the fonts people seem to actually be using in various implementations,
e.g.:

http://aplwiki.com/AplCharacters

the general choice seems to be to use both uppercase and lowercase Latin letters,
and forgo the old convention of underlined uppercase Latin letters. That seems a
small adjustment to make to not stay stuck in the 70's, frankly.

I can understand Alex's request that Unicode then effectively "solve the problem" by
providing a fixed-width 16-bit entity for "A" that could then just be added to
the restricted subset in the APL implementations. But that isn't going to happen --
because of the normalization stability guarantees for the Unicode Standard.

And in any case, if users of APL need something more sophisticated for actual
string handling than strictly limited subsets based on the assumption that
character=element_of_fixed_data_size_array, then rho and a limited subset
aren't going to handle it anyway. At that point, another layer of abstraction
would have to be built on top of the basic array and vector processing. And
then Khaled's points about character=grapheme_cluster become relevant.

--Ken

On 8/16/2015 9:53 AM, Khaled Hosny wrote:
On Sun, Aug 16, 2015 at 09:31:25AM -0700, alexweiner@alexweiner.com wrote:
So I'm not sure why the allowance was made for ä as well as other certain
characters,  but not for other things (under-bar characters) that face
similar representation issues. 
It was encoded for compatibility of pre-existing character sets AFAIK.

Regards,
Khaled

Received on Sun Aug 16 2015 - 14:37:15 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 16 2015 - 14:37:15 CDT