RE: APL Under-bar Characters

From: <alexweiner_at_alexweiner.com>
Date: Sun, 16 Aug 2015 12:41:58 -0700
Khaled,

It just occurred to me, although it looks like you wouldn't know, I would like it to hit the mailing list:
This exchange:

> Now, the ä character has a precomposed form in Unicode, and if you couple that
> with the NFC normalisation form, you'd get the above _expression_ to return 1.


> So I'm not sure why the allowance was made for ä as well as other certain
> characters, but not for other things (under-bar characters) that face
> similar representation issues. 

It was encoded for compatibility of pre-existing character sets AFAIK.

Regards,
Khaled


As far as I know, APL definitely predates the Unicode consortium. Do you think that The Consortium possibly overlooked the pre-existing under-bar character set? 



-------- Original Message --------
Subject: Re: APL Under-bar Characters
From: Khaled Hosny <khaledhosny@eglug.org>
Date: Sun, August 16, 2015 9:53 am
To: alexweiner@alexweiner.com
Cc: unicode@unicode.org

On Sun, Aug 16, 2015 at 09:31:25AM -0700, alexweiner@alexweiner.com wrote:
> Khaled,
> Thank you for the link. The normalization methods were already discussed,
> specifically here:
>
> http://lists.gnu.org/archive/html/bug-apl/2015-08/msg00047.html

Grapheme cluster boundaries detection is different from normalisation,
please read the link I provided.

> Where the problem of "how big" is ä is discussed. The answer being that this is
> one symbol, because the Unicode Consortium decided that it is also its own
> standalone character. From the thread:
>
> I'll give you an example. What would you want ⍴,'ä' to be?
>
> Right now, that could return either 1 or 2 depending on whether the ä was using
> the precomposed character (U+00E4) or the combining mark (U+0061, U+0308).
> Visually, these are identical, and generally you'd expect them to compare
> equal.

If you are counting grapheme clusters, then the answer is one in both
cases.

> In Unicode, the comparison of equivalent (but with different characters)
> strings are done by performing a normalisation step prior to comparison. There
> are 4 different types of normalisation, with different behaviour.

Quoting from the link I provided:

A key feature of default Unicode grapheme clusters (both legacy and
extended) is that they remain unchanged across all canonically
equivalent forms of the underlying text. Thus the boundaries remain
unchanged whether the text is in NFC or NFD. Using a grapheme
cluster as the fundamental unit of matching thus provides a very
clear and easily explained basis for canonically equivalent
matching. This is important for applications from searching to
regular expressions.

See also: http://unicode.org/faq/char_combmark.html#7

> Now, the ä character has a precomposed form in Unicode, and if you couple that
> with the NFC normalisation form, you'd get the above _expression_ to return 1.
>
>
> So I'm not sure why the allowance was made for ä as well as other certain
> characters, but not for other things (under-bar characters) that face
> similar representation issues.

It was encoded for compatibility of pre-existing character sets AFAIK.

Regards,
Khaled


>
>
> -------- Original Message --------
> Subject: Re: APL Under-bar Characters
> From: Khaled Hosny <khaledhosny@eglug.org>
> Date: Sun, August 16, 2015 8:17 am
> To: alexweiner@alexweiner.com
> Cc: unicode@unicode.org
>
> On Sun, Aug 16, 2015 at 07:35:17AM -0700, alexweiner@alexweiner.com wrote:
> > Hello Unicode Mailing List,
> >
> > There is significant discussion about the problems of adding capital
> letters
> > with individual under-bars in this mailing list for GNU APL.
> >
> > http://lists.gnu.org/archive/html/bug-apl/2015-08/msg00050.html
> >
> > Pretty much it adds up to the following problem:
> >
> > The string length functionality would view an 'A' code point combined
> with an
> > '_' code point as an item that has two elements, while something that
> looks
> > like 'A' Should be atomic, and return a length of one.
>
> I think what you need is better “character” counting [1], rather than
> new precomposed characters.
>
> Regards,
> Khaled
>
> 1. http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries
>
Received on Sun Aug 16 2015 - 14:42:59 CDT

This archive was generated by hypermail 2.2.0 : Sun Aug 16 2015 - 14:42:59 CDT