Re: Dutch IJ, again

From: Philippe Verdy (
Date: Sat May 24 2003 - 08:22:25 EDT

  • Next message: Karljürgen Feuerherm: "Unicode-compliant email manager on XP system"

    From: "Karl Pentzlin" <>
    > Is this true:
    > "Dutch 'ij' is (besides its special casing rule) by no means more a
    > 'single character' than e.g. Slovak 'ch'. It is encoded in Unicode
    > only for historic reasons (like backward compatibility).
    > In the era of automatic ligating (OpenType etc.), no Dutch speaker
    > really needs U+0132/U+0133 or misses an 'ij' key on their keyboard."
    > - Karl

    Encoding "ij" as a separate composite character instead as its abstract characters was needed as a better paliative solution than using other replacements such as ˙ (y with dieresis) whose glyph sometimes looks nearly the same as a ligated form for "ij" that is sometimes used with some typographic styles.

    If I just look at some past european standards for teletext and videotex, you'll find such documented "replacements", which had the bad effect of not being easily reversible. This created text databases containing what is now considered as typos, and should have better even encoded as separate i + j. Each time, those typosthat alter the original text are introduced because of technical constraints, independantly of the language actually used the first time the text was written.

    Some technical constraints come from encoding length limits (notably in relational databases), and these "approximations" introduced at one time are in fact acting like abbreviations, with the bad effect that this tends to alter the language not because of people usage of that language but because of these technical constraints (look for example the "neologisms" or "barbarisms" coming from the recent growth of SMS messaging on mobile phones, where clearly the language is severely altered in a way that the technical constraints not only creates a sort of new language, but it also alters the understanding of the normal language used by people communicating with each other.

    Let's not forget that a language is at first a communication medium for sentient peoples, not for machines... The consequence is that the language is inventive and carries many cultural aspects and implied experience, that no formal or technical system should prohibit. So whatever the standard used, its role is not to forbid the creativity and not to add more constraints on it, but should only be there to facilititate the communication between people, with machines used only as a convenient but not necessary tool. That's why I think it's much more important to allow encoding semantic relations in the transported text, than exact character sequences or glyphs.

    For this reason, I don't consider that the technical encoding of "ij" was wrong, as it's a way to remove some technical limits. So I don't consider it as an "historic" character encoded only for compatibility reasons. If it was used in the past for some specific encoding, it was done to target sentient people that prefer to see it than seeing arbitrarily truncated names displayed on devices with severe technical constraints (look at mobile phones today and their limited keyboard and display, and how a better technology can facilitate both the correct input of text on such device, but also the correct rendering of text...)

    In fact if you look more closely, you'll see that many languages hae seen their script severely impacted by technical constraints since Gutemberg: English as nearly lost all its accents and diacritics, and even Nordic letters like Thorn. Later, some language transmitted traditionally orally have been scripted using a restricted alphabet: Spanish and Breton (and probably French and English too) would have better been represented with additional letters for the "ch" or "sh" pairs, or even the English "oo" or French "ou" pair.

    Technical limitation on scripts have also altered the language (the nazalized vowels or gutural consonnants have nearly disappeared from English despite its combined Roman, Celtic and Saxon origins) in such a way that the initial phonetic system uses today a technically oriented and normative "orthograph" which is far from the spoken language (that's why we need IPA transcriptions in all English dictionnaries), and creates two distinct languages: the oral form abusively considered "popular" or "vulgar", and the written (litterary) form abusevely considered "noble". When the written language comes back to voice with reading, some text is nearly undecipherable for common people, as the written form has a different rythm and often a different system of lexical and semantic references...

    This distinction between the oral and written language is very easy to demonstrate for other languages as well, for example Arabic with its "unique" and prescriptive written form inherited from the coranic tradition, face to the many popular oral variants used in various countries.

    This is also true for Chinese and even Japanese, where the written script is the only commonly used medium by which people can understand each other; unlitterate people in these areas are severely constrained in their daily life because of their inability to use this "technical" and in fact very complex written language using many unexplained conventions, that can be considered as very valuable culture, but also as oppressive social constraints. Understanding a language necessarily goes far beyond the normative standards normally created to simplify the communication.

    This archive was generated by hypermail 2.1.5 : Sat May 24 2003 - 09:01:49 EDT