Re: Bit arithmetic on Unicode characters? from Oren Watson on 2016-10-06 (Unicode Mail List Archive)

From: Oren Watson <oren.watson_at_gmail.com>
Date: Thu, 6 Oct 2016 21:18:15 -0400

That application is hindered by the fact that

𝔆𝔋𝔌𝔕𝔝𝔺𝔿𝕅𝕇𝕈𝕉𝕑𝒝𝒠𝒡𝒣𝒤𝒧𝒨𝒭𝒺𝒼𝓄 are unallocated
characters, forming gaps in the otherwise contiguous mathematical
alphabets.

On Thu, Oct 6, 2016 at 8:28 PM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> On Thu, 6 Oct 2016 16:54:21 -0700
> Ken Whistler <kenwhistler_at_att.net> wrote:
>
> > On 10/6/2016 4:32 PM, Richard Wordingham wrote:
> > > The
> > > problem is that manually constructed lookup tables are prone to
> > > human error.
> >
> > ... as are manually constructed algorithms that attempt to take
> > advantage of sub-ranges of case pair adjacency in the Unicode code
> > charts to do casing with bit arithmetic.
>
> Yes, it's a trade-off. The application I had in mind is converting
> between mathematical letter variants and their 'plain' forms. Perhaps
> there is just enough information in the UCD to allow exhaustive,
> automated tests.
>
> For _simple_ case folding, algorithmic case folding can be expanded to
> a list of range tests, generalising what is often done for ASCII.
> Obviously the testing should be repeated with each new version of
> Unicode, which is straightforward if the case folding is compliant with
> Unicode. (Turkish would be a reason for not being compliant.)
>
> Richard.
>
Received on Thu Oct 06 2016 - 20:18:52 CDT

This archive was generated by hypermail 2.2.0 : Thu Oct 06 2016 - 20:18:52 CDT