Input methods at the age of Unicode (was: a mug)

From: Marcel Schneider <charupdate_at_orange.fr>
Date: Wed, 15 Jul 2015 11:06:41 +0200 (CEST)

On Sat, Jul 11, 2015, at 20:54, Hans Aberg wrote:

> On 11 Jul 2015, at 18:36, Johannes Bergerhausen wrote:
>>
>> As I said at TEDx in Vienna:
>> [https://www.youtube.com/watch?v=IRdupNXpm8k]

> The keyboards for different languages are essentially the same nowadays: it sends a code indicating which button is acted on and whether it is depressed or released. The computer then translates using a key map. So for a Cherokee keyboard, as discussed in the video, one would need different images on the keys if one bothers, and a key map.

> One problem here is that is that it is very time consuming to design such key maps. This is another shortcoming of Unicode usage: lack of input methods, in addition to the font issue.

I fully agree. These keyboard updates are consistent with Microsoftʼs new corporate ambition which consists in empowering people to achieve more, Microsoftʼs CEO Satya Nadella wrote to All Employees on July 10, 2014 at 6:00 a.m. PT http://bit.ly/1wRIBqD
If we understand the goal as a relative one, users will be allowed to do more than during the past few decades. Obviously, better keyboard UIs are essential in this process.

We are today mainly still using inherited ANSI keyboards, despite of using Unicode characters. Overcoming this discrepancy is urgent, and I believe that at development level, this is very easy (though it may be time consuming, as Johannes warns us). Whether it is easy at usersʼ level too, depends on the amount of novelty packed into the keymap. In Cherokee, users now would probably be learning to use casing, due to the scriptʼs new extension to bicamerality.

By contrast, to convert all US American Standard keyboards to Unicode keyboards, nothing else is needed than replacing the spacing Grave with the Letter Apostrophe, and the right-hand Alt key with a Compose key, acted by the right-hand thumb. The need of U+02BC in English results from evidence accessible by last monthʼs thread ‘A new take on the English apostrophe in Unicode’.

For example, users who want to input smart quotes without an algorighm may then type Compose, {, ", for an opening quotation mark, or Compose, ], ', for a closing single-quote. Compose, Letter Apostrophe, a, brings à. This principle extends to all Latin letters and punctuations (about two thousand, if my estimation is correct). There will then be no more separate US International keyboard layout. That layout seems not to be determined by efficiency but by itʼs creation environment (seemingly excluding dead key chaining), as well as by IBMʼs choice not to copy Digitalʼs Compose key (but the inverted T arrow keys and six miscellaneous only). The US Intl is so bad it cannot be currently kept in use, Mark Davis explained on Sun Jul 18 1999 - 13:47:47 EDT http://www.unicode.org/mail-arch/unicode-ml/Archives-Old/UML017/0558.html).

The set of all Latin letters is thus made available thanks to the chained dead keys implementation of the Compose functionality. On the other hand, designing key maps for any alphabetical language on earth appears to be rather easy. Much easier and probably far less time consuming in any case than writing some other software. Writing keyboard drivers is essentially editing key defines, allocation tables, and deadtrans function lists. The latter two are best done with spreadsheet software. At the condition that spreadsheet software (e.g. Excel 2010 Starter) is used, the job is much less complicated than it ordinarily may have the reputation. Because good keyboard layouts have long deadlists, and these are not efficiently edited with ordinary keyboard editing software UIs. Keyboard layout sources in software format too may be edited in spreadsheets and lead to good results if the deadkey chaining flag is accessible. On Windows this is the case in KbdEdit, but the object modules (drivers) compiled by this software are proprietary and therefore cannot be effectively shared.

Editing keyboard layouts is a job anybody can tackle who is willing to spend some time for a useful work (as opposed to a set of leisures like gaming, chasing and the like). Nothing is needed that would not be publicly available. Thereʼs nothing to wait for.

Good luck,

Marcel

P.S.: Thereʼs a new version of the Compose Key article in Wikipedia:
https://en.wikipedia.org/wiki/Compose_key

To quickly resume the advantages of the new US English Unicode keyboard layout and the similar UK English Unicode keyboard layout:

- Backward compatibility: Simply consider that the engraved Grave now stays for a curly apostrophe (which is very approximate but avoids keycap stickers).

- Application compatibility: The smart quotes algorithm stays working for what it is made for, and stops to be sollicited for what it isnʼt made for: simulating apostrophes in all positions, including leading apostrophes.

- Adaptability: The user recovers full autonomy and can now decide by himself whether he wants an apostrophe or a quotation mark. No more workarounds are needed.

- Efficiency: The reintroduced Compose key, on right Alt, is a super dead key which allows to type huge sets of characters without much memorization, while the nearly useless** Grave accent key position becomes suddenly useful again.

- Efficacy: No more spaces needed to type apostrophes and quotes, no key is hijacked for a dead key any longer, except the otherwise rather useless right Alt key (a double of the left, and on the wrong side of the space bar for Alt+NumPad). No more confusion with Ctrl+Alt application shortcuts, like AltGr used to create on Windows, while AltGr can be made available in a safe emulation thanks to a Shift + Right Alt dead key.

- Quality: Resulting text files are much more useful than versions that mix up apostrophes and single closing-quotes. For computer processing, paired punctuations and unpaired punctuations must be clearly distinct, regardless of any glyphic resemblance, and even more as in real English, the apostrophe has not punctuation status but letter status.

**I know that because the Grave is on the keyboard, it is used in markup and perhaps in programming (seemingly not in C/C++). On a Unicode keyboard, a Space following a diacritic dead key chain inserts the combining diacritic (which is against the inherited rule, dating from before combining diacritics were encoded). As on a Unicode keyboard, Shift+Space should be NBSP, spacing diacritics are inserted when the diacritic is followed by NBSP. Both behaviors are already implemented for Mac OS X: http://uscustom.sourceforge.net/. In current writing, spacing diacritics are generally much less useful than combining ones. To speed up the insertion of the spacing Grave, we might use Compose, s (for Spacing), g (for Grave). Likewise we would have spacing Acute (sa), Cedilla (sc) and Little Tilde (‘st’ or ‘slt’, not ‘lt’ which is already taken).

Along with this, word processor updates must extend the smart quotes algorithms to support the correct handling of the apostrophe. This too is rather easy to implement:

* Extended autocorrect settings will allow users to specify whether the most used squiggle is apostrophe or single quotation mark, and whether the apostrophe be U+02BC or U+2019. These toggles should be actionable by customizable keyboard shortcuts, and an info bubble and/or a flag will show whatʼs on.

* Conforming to Ted Clancyʼs proposal, a new Option setting will empower users to dedicate the Apostrophe key to the apostrophe *exclusively*, and to use the Quotation mark key for *all* quotation marks, whether they be double or simple. This is indeed feasible in English (otherwise as I thought when replying in the thread ‘A new take on the English apostrophe in Unicode’, and otherwise as in good French and German usage where angle quotation marks are used for quotations, vs comma quotation marks for scares [using angle quotes as scare quotes is bad practice]).

* Automatic quotes pairing therefore will insert matching characters at input, and check pairing at revision.

* Multiple stroke with circular output will insert the most used quotation mark after the Quotation mark key is hit one time, and the other after two times. The most used is set in the options. For example, in American English, the user may choose to get single quotes first because heʼs a scientist and needs to mark many words, while he may switch to double quotes first when writing litterary text. The same should be available for the Apostrophe key: whether leading apostrophe or quotation mark after one stroke, the other one after two strokes, and an appropriate sequence of both after three keystrokes. Hitting the key again will restart the cycle, and so forth. An info bubble, or colored display as suggested by William Overington on Fri, Jun 05, 2015, 11:48, could disambiguate apostrophe and quote. Alternately the letter apostrophe may be displayed on the customizable ‘field’ color as are NBSP and WJ on LibreOffice.

* New Help sections may be invoked for ready information about the usefulness of Letter Apostrophe and the features facilitating its usage. We must depart from the comfortable idea about users who are meant to be unwilling to spend any thought about why and how to distinguish two characters that look identical. This idea should be considered as respectless (despiteful, I would say), and IMO this idea is probably just a mean pretext for reducing production costs by lowering the product quality. (The product being the word processor, e.g. Microsoft Word.)

* An optional dialog will display every time there is an ambiguity, that is when a leading apostrophe is typed, and also when a trailing apostrophe is typed while a marked quotation is open (after an opening single quotation mark). This dialog may ask “Do you wish to type an apostrophe?” or alternately, “Is this a quotation mark?”. The choice may be set with Tab, and validated with Space.

* Users who wish to keep mixing up, will be welcome to do so (“☐ Donʼt ask me again”). This choice may be cancelled in the Settings (☑ Distinguish apostrophes and quotation marks; ☑ Display the apostrophe dialog).

For subscribers who have read until here and who agree to read forth, Iʼm concerned to note that any criticism is rather easily uttered as long as the default seems to be on the side of Unicode, a fact that would explain why Unicode bashing is meant to be so popular that we can find it even on mugs (see the parent thread of this), as if we were meant to take pleasure in repeating to ourselves every morning at breakfast that our universal charset is still useless and wonʼt work before a long time. By contrast, as soon as the responsibilities end up to be shifted from the Consortium to its most powerful members, as are Apple, Google, Microsoft, especially the latter, only very few persons carry on.

In this paragraph I would like to vent more and try to debrief the Apostrophe thread, but I fear that would be too long and tiresome. I just mention that many persons are monitoring this Mailing List who know exactly why Unicode decided to recommend U+02BC for the English apostrophe, and who know exactly how things happened when U+02BC was discarded to the benefit of U+2019, but that nobody conceded to disclose these pieces of information, neither when the information written up by Ted Clancy was submitted by a Mailing List subscriber, nor when I shared the results of my decrypting early NamesList versions. Consistently, I ended up to be blamed of knowing little about.

Now I try again to learn more by submitting the following three questions:

1. Why had the UTC recommended U+02BC as apostrophe?

2. Why has the UTC withdrawn its recommendation?

3. On whose demand the UTC moved the information about the preferred character for apostrophe from U+02BC to U+2019?

Answering these three questions is essential for a thorough understanding of history, which will reinforce the bases of keyboard reengineering as it must be carried on at this juncture of imminent Windows 10 release.

Best regards,

Marcel
Received on Wed Jul 15 2015 - 04:08:00 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 15 2015 - 04:08:01 CDT