Re: Gaps in Mathematical Alphanumeric Symbols

From: Asmus Freytag (t) <asmus-inc_at_ix.netcom.com>
Date: Thu, 10 Mar 2016 08:43:29 -0800
On 3/9/2016 7:08 PM, Oren Watson wrote:
I was surprised to find out that there are gaps in the Mathematical alphanumeric symbols block (U+1d400 to u+1d7ff). The gaps are associated with the inclusion of similar symbols in other blocks, chiefly the Letterlike Symbols Block.
Correct.

Examples of such gaps include U+1d49d, U+1d506, etc.

But as a matter of convenience and simplicity,

As a matter of history, the characters that would have gone into those gaps were already encoded.

The stated purpose for alphanumerics in math is to serve for variables. For example, that means they are not intended to be used as list markers, which would have been a use case for which a contiguous range would be essential. Variable  names are not usually indexed, but if they must show up in sorted lists, any capable sort algorithm can be set up so the weights make them contiguous across the gap (if the UCA tables do not do that by default already, perhaps it's worth ensuring that they do).

these missing codepoints could have been defined, as decomposing directly to the equivalents in Letterlike symbols, in the same manner that the Ångström sign decomposes to the letter Å. That would make these ranges contiguous.

The original case for the Ångström as for the Kelvin was that is has been encoded twice in some other standards. The historical mistake was to not code them as part of the "squared" abbreviations, because that's where they came from, in the mistaken belief that it would be generally useful to have these and not the regular Å and K for the units.

None of that applies to the alphanumerics, so it's good to have avoided the duplicate encoding.

Is there a policy about leaving gaps in otherwise contiguous ranges of codepoints?

I believe UTC tends to avoid gaps, but will leave them if the circumstances of the case warrant that. In this case, not leaving gaps and silently skipping already encoded characters, would have had the effect of misleading user into expecting a complete alphabet, so the gap was the less-bad alternative.

A./
Received on Thu Mar 10 2016 - 10:44:10 CST

This archive was generated by hypermail 2.2.0 : Thu Mar 10 2016 - 10:44:10 CST