Marking up hexadecimal numbers (was: Re: a character for an unknown character) from Marcel Schneider on 2017-01-02 (Unicode Mail List Archive)

From: Marcel Schneider <charupdate_at_orange.fr>
Date: Mon, 2 Jan 2017 19:19:04 +0100 (CET)

On Sat, 31 Dec 2016 22:04:02 +0100 (CET), I wrote:
> On Sat, 31 Dec 2016 11:01:16 +0100, Christoph Päper wrote:
> >
> > Richard Wordingham :
> > >
> > >> Perhaps the letters for hexadecimal digits should have been encoded
> > >> separately?
> > >
> > > The idea has been rejected several times.
> >
> > It has indeed. That’s why two different technologies have to be used to get
> > typographically harmonic hexadecimal numbers, e.g. in CSS:
> >
> > .hex {font-variant-numeric: oldstyle-nums; text-transform: lowercase;}
> > .hex {font-variant-numeric: lining-nums; text-transform: uppercase;}
> >
> > This works well enough for ‘01ef’ or ‘01EF’, but will fail for conventions like
> > ‘0x01ef’ and ‘01EFh’. Hence:
> >
> > .hex::before {content: "0x"; text-transform: none;}
> > .hex::after {content: "h"; text-transform: none;}
> > .hex::after {content: "ₕ";}
> > .hex::after {content: "16"; vertical-align: sub; font-size: smaller; line-height: normal;}
> > .hex::after {content: "16"; font-variant-position: sub;}
> > .hex::after {content: "₁₆";}
>
> Thank you for the code. I didnʼt know this, so Iʼve tried and found that
> the automatic prefixes/suffixes cannot be copied from the web page.
> That seems to me a disadvantage.
>
> Among the possibilities, you include Unicode subscripts. Is this current
> practice? That seems to me very interesting to follow up, as it documents
> that the stable representation scheme is already adopted. Iʼm curious to
> what extent it is so.
>
[…]
>
> I note that the "U+" prefix is missing in the list, obviously because it
> denotes more than just a hexadecimal number, and is to be hard-coded.
[…]

Alternatively, the CSS style derived from the above could be:

.unicode {font-variant-numeric: lining-nums; text-transform: uppercase;}
.unicode::before {content: "U+"; text-transform: none;}

But again, when the reader copies such a scalar value, he gets it without 'U+'.
Hence the idea that the '[[H]H]HHHH' could be
parsed to add the prefix after the open-tag, so as to be able to skip the
second line above.

Similarly, the 'HHHH' can be complemented with '₁₆',
or with '0x' or '\x' or whatever, as hard-coded additions by a parser.
This has IMO two advantages:

1) When the user copies hex numbers from the browser, hex numbers stay prefixed
or suffixed as such.

2) When the user pastes hex numbers into a text editor, theyʼre not messed up
(applies to the '₁₆' suffix, vs '_{16}' suffix). Otherwise, a hex number like
'1A19₁₆' is turned to '1A1916'.

The actual policy is certainly based on the classification of hexadecimal numbers
(and numbers in other non-decimal numeral systems) as mathematical notation,
rather than technical notation. In a wide lecture of TUS, all measurement units
are granted the use of superscript digits '²' and '³'. Could this policy be
extended to include subscript '₁' and '₆'? This may seem an odd question, and
responding it positively would eventually throw the door open to wider use of
Latin superscripts in historical data first ('Vᵉ s.'), in more general data next.

As the upside I see content stability and streamlined input (provided that the
input interface is up-to-date). Disparity in display may be considered a downside,
since only fonts that have reduced capitals (Consolas, Lucida Console, Courier)
have modifier letters accurately like superscripts / ordinal indicators. Iʼve
started getting habits with using modifier letters in abbreviations, and I find
they look good in other fonts too.

Right now, itʼs just up to put them on the keyboard and tell the user “please use
them if you are comfortable with; original encoding for phonetics does not
preclude re-use and diversification of usage conventions.” There is a need of
some explanation to be delivered, because people who know something about Unicode
typically oppose the sometimes passionate refrain saying that these characters are
for use in phonetics only.

Definitely, by the actual wording of the relevant parts of the Unicode Standard,
Unicode is fueling its own misperception.

Some hints in the opposite way, ideally in TUS 10.0 to be published this year 2017,
would (in my opinion) be highly appreciated. Though of course that is not enough to
make people really happy.

Marcel
Received on Mon Jan 02 2017 - 12:19:53 CST

This archive was generated by hypermail 2.2.0 : Mon Jan 02 2017 - 12:19:53 CST