Re: Support for Latin ligature IJ

From: Marcel Schneider <>
Date: Thu, 31 Mar 2016 06:04:49 +0200 (CEST)

On Wed, 30 Mar 2016 23:42:20 +0200, Philippe Verdy wrote:

> Note that the single letter "ij" in Dutch is often undistinctable from "ÿ", which is also commonly found as a convenient substitute in many old documents not encoded with Unicode but with ISO8859-1 : this has a caveat because the capitalization would produce "Y" (in ISO8859-1), possibly followed by a combining diaeresis (in Unicode-encoded documents) instead of "IJ" (more correct but not perfect) or the "IJ" letter (best choice).

Almost regularly also the uppercase ‘IJ’ was represented as a ‘Y’ in Dutch pre-computer text and signing.

Sadly to say, with its excluding three French characters (ÿ, œ, Œ)—and missing four Finnish ones—Latin-1 was not what could have been called a Western European charset, even though the euro sign could not be anticipated.

> The use of "ÿ" in Dutch should also be considered as an orthographic fault, and it should be corrected into "ij" (to solve the capitalization problem), but there are occurences in Dutch of "ÿ" which is correct (notably in borrowed French toponyms such as "L’Haÿ-les-Roses")
> There may be similar examples in Belgium with French toponyms, but I suspect that those Belgian-French toponyms have their own Dutch "officialized" variant which would be preferable without borrowing the Belgian-French orthography, so that they will not need "ÿ", and they will likely use "ij" instead, meaning that the autocorrection of "ÿ" from possible Belgian-French toponyms into "ij" will also be correct for Dutch-Belgian toponyms ; it may also be correct for French-French toponyms like "L’Haÿ-les-Roses" transformed into "L’Haij-les-Roses" in Belgian-Dutch, or "L’HAIJ-LES-ROSES" if capitalized, if autocorrected this way; it would however be incorrect to replace there the "ij" (or IJ) letter by the two letters "ij" (or "IJ") without the orthographic ligature...
> By curiosity, I looked into the Dutch Wikipedia to see how they wrote "L’Haÿ-les-Roses" and they don't transform the French "ÿ" into some Dutch "ij" (and they don't have any other "officialized" Dutch orthography.
> For this reason, the autocorrection of the "ÿ" letter into the "ij" letter in Dutch is disabled by default (even if it would be needed to look into old documents encoded with ISO8859-1).
> The situation is more complex for the autocorrection of the "ij" digram (extremely frequent in old documents encoded with ISO8859-1) into the plain "ij" letter, which seems to be active in various wordprocessors (but which causes problems with borrowed non-Dutch names).

Yet another example of how autocorrection-based functioning designed to keep in use outdated keyboard layouts is at risk of running into a mess.
> 2016-03-30 23:19 GMT+02:00 Philippe Verdy :
> > In my opinion, the Dutch IJ/ij "ligature" is not really a ligature and should be treated exactly like Æ/æ or Œ/œ as a plain single letter.

I fully agree that these are all plain letters. Consistently, Unicode encoded them all as such: LATIN CAPITAL LETTER I J, LATIN CAPITAL LETTER A E, LATIN CAPITAL LETTER O E. The misleading “LIGATURE” names have been enforced by ISO, and subsequently partially corrected by Unicode on the request of the mainly concerned NB. ‘IJ’ too is considered a letter in Dutch. In French, the administrative POV is that ‘Œ’ and ‘OE’ are equivalent, and that has been agreed by a representative of the linguistic authority.

The point is that (1) one cannot ask people to use letters that are not on their keyboard, (2) one cannot ask software providers to add them in the layout driver while they arenʼt printed on keycaps, and (3) one cannot ask manufacturers to add them on the keyboard as long as that is not specified by any official standard. But all that shall now change.

Same problem (presumably) on Dutch keyboards, and here again things should soon be ipmroved, when the future revised ISO/IEC 9995 includes a compose key, at least on Right Alt + Space. Such a gateway can be added without altering the space bar, which is the one key that does not need to be engraved, and behind, all characters of the current script can be added without sticking anything more on the keycaps.

> >
> > The use of IJ/ij (encoded as separate letters) is a actually an orthographic fault, that a ligature will not help resolve.

As of the actual meaning of “ligature”, see above, but you are completely right.

> >
> > Thanks, the decomposition of the "IJ" letter or "ij" into separate letters is only a compatibility decomposition, but it is not canonically equivalent.

That will help improve the cited Wikipedia article. Correcting documentation is actually a precondition for users to dare type U+0132/U+0133.

> >
> > In such as case, the "ij" letter is soft-dotted also in Dutch and the two dots disappear when it has diacritics above.
> >
> > For Lithuanian, the "ij" letter is not soft-dotted, but effectively hard-coded (meaning also that it is really a ligature, and that the single-letter should not be used at all, but encoded as i+j with a possible joiner...). In such a case, using the single letter "IJ/ij" meant only for Dutch is also an orthographic fault. But this also means that when you add diacritics in Lithuanian, you'll need to encode explicit dots (like in Turkish) to keep these dots !

The oopsie is that in some implementations, this way you get two stacked dots plus the other diacritic…
We can only hope that this is now fixed.

Received on Wed Mar 30 2016 - 23:06:34 CDT

This archive was generated by hypermail 2.2.0 : Wed Mar 30 2016 - 23:06:35 CDT