Re: APL Under-bar Characters

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sun, 16 Aug 2015 19:27:13 +0100

On Sun, 16 Aug 2015 18:53:52 +0200
Khaled Hosny <khaledhosny_at_eglug.org> wrote:

> On Sun, Aug 16, 2015 at 09:31:25AM -0700, alexweiner_at_alexweiner.com
> wrote:

> > Now, the ä character has a precomposed form in Unicode, and if you
> > couple that with the NFC normalisation form, you'd get the above
> > _expression_ to return 1.

> > So I'm not sure why the allowance was made for ä as well as other
> > certain characters, but not for other things (under-bar
> > characters) that face similar representation issues.

> It was encoded for compatibility of pre-existing character sets AFAIK.

Note that compatibility means allowing habits of treating the
precomposed characters as single characters to continue. These habits
allowed simple transition, but now cause confusion. Most rules work
better in NFD than NFC. For string lengths in NFC, you
immediately lose the rule len(a + b) = len(a) + len(b). For
NFC, you don't even have len(a + b) <= len(a) + len(b). However, do
note that for the corresponding 'string' algebra, the mathematical
concept of a string no longer works - and this applies to both NFC and
NFD. Instead, you have to allow for pairs of characters commuting, and
so you get the concept of a 'trace'.

If all combinations of base character and non-spacing marks were
encoded, there'd be infinitely many. Polytonic Greek has 36
*precomposed* combinations of base character and 3 combining marks, and
some languages frequently use base characters with 4 combining marks;
unexceptional words with 5 combining marks are less frequent.

Richard.
Received on Sun Aug 16 2015 - 13:28:32 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 16 2015 - 13:28:32 CDT