Re: (base as a combing char)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 27 2004 - 17:17:52 CST

  • Next message: Mike Ayers: "RE: My Querry"

    From: "John Cowan" <jcowan@reutershealth.com>
    > the need to encode Dutch
    > ij as a single character, which is neither necessary nor practical.
    > (U+0132 and U+0133 are encoded for compatibility only.) In cases where
    > ij is a digraph in Dutch text, i+ZWNJ+j will be effective.

    I suppose you wanted to speak about the rare cases in Dutch where ij is NOT
    a digraph for a single letter, and for which i+ZWNJ+j could be effective...
    if only it was not opposed to the tradition (and many legacy encodings and
    keyboards), that do generate U+0132 and U+0133 or an y/Y with diaeresis when
    this is a digraph, considering that i+j in that case is not a digraph but
    two distinct letters.

    There will remain an ambiguity for long time in Dutch, simply because
    ISO-8859-1 (U+0000 to U+00FF) is too often the only subset offered to Dutch
    typists, where neither U+0132 and U+10133 are present, nor ZWNJ (in that
    case, those that want the distinction often use an y with diaeresis for
    lowercase, and don't mark the difference for uppercase (as there's no
    uppercase Y with diaeresis in ISO-8859-1) which occurs much more rarely
    (Windows users can however use an uppercase Y with diaeresis, U+0178, to
    mark the single-letter digraph, because it is present in Windows codepage
    1252 at the code position 0x9F).

    I doubt seeing one day a ZWNJ key mapped on standard Dutch keyboards, given
    that most occurences of the non-digraph two-letters i+j come from some
    imported (originally non-Dutch) rare words. (But Windows notepad and some
    Windows text input components include a contextual menu to insert this
    formating control...)

    The problem with ZWNJ is that it is just encoding a typographic distinction,
    not a semantic one that Dutch users would expect: this means that it has no
    semantic itself, and its rendering is also optional. Those that want a
    strong distinction will more likely use U+0132 and U+0133 in their word
    processors, assisted by Dutch lexical correctors so that they will just need
    to enter "i" then "j", and let the word processor substitute the two letters
    appropriately by the ij ligated letter when it is appropriate, leaving other
    instances unchanged.

    As the ij ligated letter is most certainly the most frequent case for
    entering Dutch text, it may be the default behavior of a Dutch input method,
    and the assisting dictionnary will just need to reference the rare cases
    where the substitution must not occur (the substitution will not occur
    within text sections marked as belonging to another language, and users can
    also cancel with "backspace" this automatic substitution in their word
    processor).

    Other less performing word processors, without assisting dictionnaries, may
    substitute instead the occurences of y/Y with diaeresis that are inputed by
    users into U+0132/U+0133 (a solution which may be quite easy for Belgian and
    French users that can easily make use of the diaeresis dead key, also useful
    for entering French text)...

    This means that modern word processors will contain lots of U+0132/U+0133
    which will be clearly distinct from the other cases where i and j are left
    isolated; and ZWNJ will not be needed!



    This archive was generated by hypermail 2.1.5 : Sat Nov 27 2004 - 17:19:49 CST