From: Doug Ewell <doug_at_ewellic.org>

Date: Fri, 07 Oct 2016 09:06:31 -0700

Date: Fri, 07 Oct 2016 09:06:31 -0700

Richard Wordingham wrote:

*> Yes, it's a trade-off. The application I had in mind is converting
*

*> between mathematical letter variants and their 'plain' forms.
*

Long-time list members might remember a Windows utility I wrote to

convert between normal Unicode text and Mathematical Alphanumeric

Symbols. Andrew West (of BabelPad fame) has a similar, web-based app

that also supports things like small caps and superscript.

Both of these use lookup tables to do the conversions, and use

algorithms only for very broad-based operations, like distinguishing the

Latin-letter range in the MAS block from the Greek letters and the

digits. There's no practical value in implementing conversions like this

algorithmically. Maybe if there were one or two exceptions in the MAS

range instead of two dozen, it might be different.

*> Perhaps there is just enough information in the UCD to allow
*

*> exhaustive, automated tests.
*

I can't find anything in the UCD that distinguishes one "font variant"

from another (UnicodeData.txt shown as an example):

1D400;MATHEMATICAL BOLD CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D434;MATHEMATICAL ITALIC CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D468;MATHEMATICAL BOLD ITALIC CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D49C;MATHEMATICAL SCRIPT CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D4D0;MATHEMATICAL BOLD SCRIPT CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D504;MATHEMATICAL FRAKTUR CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D538;MATHEMATICAL DOUBLE-STRUCK CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D56C;MATHEMATICAL BOLD FRAKTUR CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D5A0;MATHEMATICAL SANS-SERIF CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

1D5D4;MATHEMATICAL SANS-SERIF BOLD CAPITAL A;Lu;0;L;<font>

0041;;;;N;;;;;

1D608;MATHEMATICAL SANS-SERIF ITALIC CAPITAL A;Lu;0;L;<font>

0041;;;;N;;;;;

1D63C;MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL A;Lu;0;L;<font>

0041;;;;N;;;;;

1D670;MATHEMATICAL MONOSPACE CAPITAL A;Lu;0;L;<font> 0041;;;;N;;;;;

And that's probably as it should be, because UTC never intended MAS to

be readily transformed to and from "plain" characters. They're supposed

to be used for mathematical expressions in which styled letters have

special meaning. (My utility, and I'm sure Andrew's, were written

entirely tongue-in-cheek.)

*> My email client found a font to render U+1D547 as the unwary
*

*> would expect, i.e. using a glyph suitable for ℙ U+2119 DOUBLE-STRUCK
*

*> CAPITAL P. I was surprised when I first saw those gaps; I would have
*

*> expected characters with appropriate singleton decompositions to protect
*

*> the unwary. (The idea might have come up at the time of encoding, and
*

*> been dismissed with reasons.)
*

Unifying identical characters with identical meanings, rather than

creating pointless duplicates, was a major design tenet of Unicode.

*> I don't know whether the font's misrendering is an accident or is
*

*> deliberate partial protection of the victims of bad character code
*

*> selection.
*

Either way, it's a bug. Users who try to render an unassigned code point

should not be "protected" by showing them a glyph that the font designer

thought should be there. They should be shown a .notdef glyph so they

know something is wrong.

-- Doug Ewell | Thornton, CO, US | ewellic.orgReceived on Fri Oct 07 2016 - 11:07:08 CDT

*
This archive was generated by hypermail 2.2.0
: Fri Oct 07 2016 - 11:07:09 CDT
*