Re: A sign/abbreviation for "magister" from Asmus Freytag via Unicode on 2018-10-31 (Unicode Mail List Archive)

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Wed, 31 Oct 2018 17:21:08 -0700

On 10/31/2018 3:37 PM, Marcel Schneider via Unicode wrote:

On 31/10/2018 19:42, Asmus Freytag via Unicode wrote:

On 10/31/2018 11:10 AM, Marcel Schneider via Unicode wrote:

which, if my understanding of "convient" is correct, carefully does
[not] quite say that it is *wrong* not to superscript, but that one should
superscript when one can because that is the convention in typography.

Draft style may differ from mail style, and this, from typography, only 
due to the limitations imposed by input interfaces. These limitations are 
artificial and mainly the consequence of insufficient development of said 
interfaces. If the computer is anything good for, then that should also 
include the transition from typewriter fallbacks to the true digital 
representation of all natural languages. Latin not excluded.

It is a fallacy that all text output on a computer should match the convention 
of "fine typography".

Much that is written on computers represents an (unedited) first draft. Giving 
such texts the appearance of texts, which in the day of hot metal typography, 
was reserved for texts that were fully edited and in many cases intended for 
posterity is doing a disservice to the reader.

The disconnect is in many people believing the user should be disabled to write 
[prevented from writing] 
his or her language without disfiguring it by lack of decent keyboarding, and 
that such input should be considered standard for user input. Making such text 
usable for publishing needs extra work, that today many users cannot afford, 
while the mass of publishing has increased exponentially over the past decades. 
The result is garbage, following the rule of “garbage in, garbage out.”

No argument that there are some things that users cannot key in easily and that the common
fallbacks from the days of typewritten drafts are not really appropriate in many texts that
otherwise fall short of being "fine typography".

The real 
disservice to the reader is not to enable the inputting user to write his or her 
language correctly. A draft whose backbone is a string usable as-is for publishing
is not a disservice, but a service to the reader, paying the reader due respect. 
Such a draft is also a service to the user, enabling him or her to streamline the 
workflow. Such streamlining brings monetary and reputational benefit to the user.

I see a huge disconnect between "writing correctly" and "usable as-is for publishing". These
two things are not at all the same.

Publishing involves making many choices that simply aren't necessary for more "rough & ready"
types of texts. Not every twitter or e-mail message needs to be "usable as-is for publishing", but
should allow "correctly written" text as far as possible.

When "desktop publishing" as it was called then, became available, too many people started to
obsess with form over content. You would get these beautifully laid out documents, the contents
of which barely warranted calling them a first draft.


That disconnect seems to originate from the time where the computer became a tool 
empowering the user to write in all of the world’s languages thanks to Unicode.

No, this has nothing to do with Unicode / multi-script support.

The concept of “fine typography” was then used to draw a borderline between what 
the user is supposed to input, and what he or she needs to get for publication.

This same dividing line applies in English (or any of the other individual languages).

In the same move, that concept was extended in a way that it should include the 
quality of the string, additionally to what _fine typography_ really is: fine 
tuning of the page layout, such as vertical justification, slight variations in 
the width of non-breakable spaces, and of course, discretionary ligatures.

Certain elements of styling are also part of fine typography. In some cases, readying a "string"
for publication also means applying spelling conventions or grammatical conventions (for those
cases where there are ambiguities in the common language, or applying preferred word choices
or ways of formulating things that may be particular to individual publishers or types of publications.

Using HYPHEN-MINUS instead of "EN DASH" or "HYPHEN" is perfectly OK for early stages of
drafting a text. Attempting to follow those and similar conventions during that phase forces
the author to pay attention to the wrong thing - his or her focus should be on the ideas and
the content, not the form of the document.


Producing a plain text string usable for publishing was then put out of reach 
of most common mortals, by using the lever of deficient keyboarding, but also 
supposedly by an “encoding error” (scare quotes) in the line break property of 
U+2008 PUNCTUATION SPACE, that should be non-breakable like its siblings 
U+2007 FIGURE SPACE (still—as per UAX #14—recommended for use in numbers) and 
U+2012 FIGURE DASH to gain the narrow non-breaking space needed to space the 
triads in numbers using space as a group separator, and to space big punctuation 
in a Latin script using locale, where JTC1/SC2/WG2 had some meetings for the UCS:
French.

Those details should be handled in a post-processing phase for documents that are intended
for publication. One of the big problem in current architectures is that things like "autocorrect"
which attempt to overcome the limitations of the current keyboards, are applied at input time
only; and authors need to constantly interact with these helpers to make sure they don't mis-
fire. Much text that is laboriously prepared this way, will not survive future revisions during
the editing process needed to get the *content* to publication quality.

All because users have no convenient tool to "touch-up" these dashes, quotes, and spaces
in a later phase; at the same time they apply copy-editing, for example.

For everybody having beneath his or her hands a keyboard whose layout driver is 
programmed in a fully usable way, the disconnect implodes. At encoding and input 
levels (the only ones that are really on-topic in this thread) the sorcery called 
fine typography sums then up to nothing else than having the keyboard inserting 
fully diacriticized letters, right punctuation, accurate space characters, and 
superscript letters as ordinal indicators and abbreviation endings, depending 
on the requirements.

In the days of typewritten manuscripts you had to follow certain conventions that allowed the
typesetter to select the intended symbols and styled letters. I'm not arguing that we should
return to where such fallbacks are used. And certainly not arguing that we should be using
ASCII fallbacks for letters with diacritics, such as "oe" for "ö".

But many issues around selecting the precise type of space or dash are not so much issues
of correct content but precisely issues of typography.

Some occupy an intermediate level, where it would be quite appropriate to apply them to
many automatically generated texts. (I am aware of your efforts in CLDR to that effect). But
I still believe that they have no place in content focused writing.


Now was I talking about “all text output on a computer”? No, I wasn’t. 

The computer is able to accept input of publishing-ready strings, since we have 
Unicode. Precluding the user from using the needed characters by setting up 
caveats and prohibitions in the Unicode Standard seems to me nothing else than 
an outdated operating mode. U+202F NARROW NO-BREAK SPACE, encoded in 1999 for 
Mongolian [1][2], has been readily ripped off by the French graphic industry. 
In 2014, TUS started mentioning its use in French [3]; in 2018, it put it on 
top [4]. 
That seems to me a striking example of how things encoded for other purposes 
are reused (or following a certain usage, “abused”, “hacked”, “hijacked”) in 
locales like French. If it wasn’t an insult to minority languages, that 
language could be called, too, “digitally disfavored” in a certain sense.

On the other hand, I'm a firm believer in applying certain styling attributes 
to things like e-mail or discussion papers. Well-placed emphasis can make such 
texts more readable (without requiring that they pay attention to all other 
facets of "fine typography".)

The parenthesized sidenote (that is probably the intended main content…) makes 
this paragraph wrong. I’d buy it if either the parenthesis is removed or if it 
comes after the following.

Now you are copy-editing my e-mails. :)

I don't read or write French on the level that I can evaluate your contention that the language
is digitally disadvantaged.

To some extent, software will always reflect the biases of its creators, and in some subtle ways
these will end up in conflict with conventions in other languages. In some cases, conventions
applied by human typesetters cannot easily be duplicated by software that cannot recognize
the meaning of the text, and in some cases we have seen languages abandoning these
conventions in recent reforms in favor of a set of rules that are a bit more "mechanistic"
if you will.

In German, it used to be necessary to understand the word division to know whether or not
to apply a ligature. Some of the rules for combining words into compounds were changed
and that may have made that process more regular as well.

But still, forcing all users to become typesetters was one of the wrong turns taken during the
early development of publishing on computers. You seem to revel in knowing all the little
details in French usage, but I bet not even all educated French people reach your level.

The best keyboard drivers won't help. So the idea that every string is supposed to be
"publication-ready" remains a fallacy. However, there shouldn't be encoding obstacles
to creating publication-ready strings. (Whether created by copy-editors, typesetters, or
advanced tools that post-process draft texts).

If an Twitter message uses spaces around punctuation that are not the right width, who
cares; but if your copy-editor can't prepare a manuscript for publication because of software
limitations, that's a different can of worms.

A./


With due respect, I need to add that the disconnect in that is visible only to 
French readers. Without NNBSP, punctuation à la française in e-mails is messed 
up because even NBSP is ignored (I don’t know what exactly happens at backend; 
anyway at frontend it’s like a normal space in at least one e-mail client and 
in several if not all browsers, and if pasted in plain text from MS Word, it’s 
truly replaced with SP. All that makes e-mails harder to read. Correct spacing 
with punctuation in French is often considered “fine-tuning”, but only if that 
punctuation spacing is not supported by the keyboard driver, and that’s still 
almost always the case, except on the updated version 1.1 of the bépo layout 
(and some personal prototypes not yet released).

Not using angle quotation marks doesn’t fix it, given four other punctuation 
marks still need spacing (and are almost forcibly spaced with SP by lack of 
anything better), and given not using angle quotation marks makes any French 
text harder to read when there is no means to distinguish citation quotes 
« … » and scare quotes “…” following a scheme that may not be well known yet. 
See already [5] (with the reader comments) for an overview of the problem.

Thank you for your attention.

Best regards,

Marcel

[1] TUS version 3, chapter 6, page 150, table:
https://www.unicode.org/versions/Unicode3.0.0/ch06.pdf#%5B%7B%22num%22%3A4%2C%22gen%22%3A0%7D%2C%7B%22name%22%3A%22XYZ%22%7D%2Cnull%2C
214%2Cnull%5D

[2] TUS version 10 (the last one having detailed bookmarks), ch. 13, p. 534:
https://www.unicode.org/versions/Unicode10.0.0/ch13.pdf#I1.27802

[3] TUS version 7, chapter 6, page 265:
https://www.unicode.org/versions/Unicode7.0.0/ch06.pdf#G17097

[4] TUS version 11, chapter 6, page 265 (no direct link):
https://www.unicode.org/versions/Unicode11.0.0/ch06.pdf#G1834

[5] « Les antiguillemets comme symboles de la postvérité », /Le Devoir/, 2016-12-30 (in French):
https://www.ledevoir.com/societe/actualites-en-societe/488139/mises-aux-points-les-antiguillemets-comme-symboles-de-la-postverite

Received on Wed Oct 31 2018 - 19:21:21 CDT

This archive was generated by hypermail 2.2.0 : Wed Oct 31 2018 - 19:21:22 CDT