From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 27 2004 - 17:17:52 CST
From: "John Cowan" <jcowan@reutershealth.com>
> the need to encode Dutch
> ij as a single character, which is neither necessary nor practical.
> (U+0132 and U+0133 are encoded for compatibility only.) In cases where
> ij is a digraph in Dutch text, i+ZWNJ+j will be effective.
I suppose you wanted to speak about the rare cases in Dutch where ij is NOT
a digraph for a single letter, and for which i+ZWNJ+j could be effective...
if only it was not opposed to the tradition (and many legacy encodings and
keyboards), that do generate U+0132 and U+0133 or an y/Y with diaeresis when
this is a digraph, considering that i+j in that case is not a digraph but
two distinct letters.
There will remain an ambiguity for long time in Dutch, simply because
ISO-8859-1 (U+0000 to U+00FF) is too often the only subset offered to Dutch
typists, where neither U+0132 and U+10133 are present, nor ZWNJ (in that
case, those that want the distinction often use an y with diaeresis for
lowercase, and don't mark the difference for uppercase (as there's no
uppercase Y with diaeresis in ISO-8859-1) which occurs much more rarely
(Windows users can however use an uppercase Y with diaeresis, U+0178, to
mark the single-letter digraph, because it is present in Windows codepage
1252 at the code position 0x9F).
I doubt seeing one day a ZWNJ key mapped on standard Dutch keyboards, given
that most occurences of the non-digraph two-letters i+j come from some
imported (originally non-Dutch) rare words. (But Windows notepad and some
Windows text input components include a contextual menu to insert this
formating control...)
The problem with ZWNJ is that it is just encoding a typographic distinction,
not a semantic one that Dutch users would expect: this means that it has no
semantic itself, and its rendering is also optional. Those that want a
strong distinction will more likely use U+0132 and U+0133 in their word
processors, assisted by Dutch lexical correctors so that they will just need
to enter "i" then "j", and let the word processor substitute the two letters
appropriately by the ij ligated letter when it is appropriate, leaving other
instances unchanged.
As the ij ligated letter is most certainly the most frequent case for
entering Dutch text, it may be the default behavior of a Dutch input method,
and the assisting dictionnary will just need to reference the rare cases
where the substitution must not occur (the substitution will not occur
within text sections marked as belonging to another language, and users can
also cancel with "backspace" this automatic substitution in their word
processor).
Other less performing word processors, without assisting dictionnaries, may
substitute instead the occurences of y/Y with diaeresis that are inputed by
users into U+0132/U+0133 (a solution which may be quite easy for Belgian and
French users that can easily make use of the diaeresis dead key, also useful
for entering French text)...
This means that modern word processors will contain lots of U+0132/U+0133
which will be clearly distinct from the other cases where i and j are left
isolated; and ZWNJ will not be needed!
This archive was generated by hypermail 2.1.5 : Sat Nov 27 2004 - 17:19:49 CST