Re: A last missing link for interoperable representation

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Sat, 12 Jan 2019 10:50:00 -0800
On 1/12/2019 5:22 AM, Richard Wordingham via Unicode wrote:
On Sat, 12 Jan 2019 10:57:26 +0000 (GMT)
Julian Bradfield via Unicode <unicode@unicode.org> wrote:

It's also fundamentally misguided. When I _italicize_ a word, I am
writing a word composed of (plain old) letters, and then styling the
word; I am not composing a new and different word ("_italicize_") that
is distinct from the old word ("italicize") by virtue of being made up
of different letters.
And what happens when you capitalise a word for emphasis or to begin a
sentence?  Is it no longer the same word?

Typographically, the act of using italics or different font weight is more akin to using a different font than to using different letters. Not only did old metal types require the creation of a different font (albeit with a design coordinated with the regular type) but even in the digital world, purpose designed italic etc. typefaces beat attempts at parametrizing regular fonts. (Although some of the intelligence that goes into creating those designs can nowadays be approximated by automation).

What this teaches you is that italicizing (or boldfacing) text is fundamentally related to picking out parts of your text in a different font. It's an operation on a span of text, not something that results in different letters (or letter attributes).

Deep in the age of metal type this would have been no surprise to users. As I had occasion to mention before, some languages had the (rather universally observed) typographical convention of setting apart foreign term apart by using a different font (Antiqua vs. Fraktur for ordinary text). At the same time, other languages used italics for the same purpose (which technically also meant using a different typeface).

To go further, the use of typography to mark emphasis also followed conventions that focused on spans of letters not on the individual letters. For example, in Fraktur, you would never have been able to emphasize a single letter, as emphasis was conveyed by increased inter-letter spacing. (That restriction was not as limiting as it appears in languages that do not have single-letter words).

Anyway, this points to a way to make the distinction between plain text and rich text a more principled one (and explains why math alphabets seemingly form an exception).

The domain of rich text are all typographic and stylistic elements that establish spans of text, whether that is underlining, emphasis, letter spacing, font weight, type face selection or whatever. Plain text deals with letters in a way that is as stateless as possible, that is, does not set up spans. Math alphabetics are an exception by virtue of the fact that they are individual letters that have a particular identity different from the "same" letter in text or the "same" letter that's part of a different math alphabet.

So those screen readers got it right, except that they could have used one of the more typical notational conventions that the mathalphabetics are used to express (e.g. "vector" etc.), rather than rattling off the Unicode name.

To reiterate, if you effectively require a span (even if you could simulate that differently) you are in the realm or rich text. The one big exception to that is bidi, because it is utterly impossible to do bidi text without text ranges. Therefore, Unicode plain text explicitly violates that principle in favor of achieving a fundamental goal of universality, that is being able to include the bidi languages.

None of the other uses contemplated here rise to the same level of violating a fundamental goal in the same way.

A./

Received on Sat Jan 12 2019 - 12:50:16 CST

This archive was generated by hypermail 2.2.0 : Sat Jan 12 2019 - 12:50:17 CST