Re: Bit arithmetic on Unicode characters?

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Fri, 7 Oct 2016 08:14:07 +0100

On Thu, 6 Oct 2016 21:18:15 -0400
Oren Watson <oren.watson_at_gmail.com> wrote:

> On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham <
> richard.wordingham_at_ntlworld.com> wrote:

> > Yes, it's a trade-off. The application I had in mind is converting
> > between mathematical letter variants and their 'plain' forms.
> > Perhaps there is just enough information in the UCD to allow
> > exhaustive, automated tests.

> That application is hindered by the fact that
>
> π”†π”‹π”Œπ”•π”π”Ίπ”Ώπ•…π•‡π•ˆπ•‰π•‘π’π’ π’‘π’£π’€π’§π’¨π’­π’Ίπ’Όπ“„ are unallocated
> characters, forming gaps in the otherwise contiguous mathematical
> alphabets.

(Aside: That written statement is illegal! -:)
 
Yep. It's a known nuisance, which is why I suggested exhaustive tests.
My email client found a font to render U+1D547 as the unwary
would expect, i.e. using a glyph suitable for β„™ U+2119 DOUBLE-STRUCK
CAPITAL P. I was surprised when I first saw those gaps; I would have
expected characters with appropriate singleton decompositions to protect
the unwary. (The idea might have come up at the time of encoding, and
been dismissed with reasons.) I don't know whether the font's
misrendering is an accident or is deliberate partial protection of the
victims of bad character code selection.

An old application of arithmetic was transliteration between the
major Indian Indic scripts. That falls foul of Tamil and of characters
that were not represented in ISCII.

Richard.
Received on Fri Oct 07 2016 - 02:14:34 CDT

This archive was generated by hypermail 2.2.0 : Fri Oct 07 2016 - 02:14:34 CDT