Re: Dutch IJ, again

From: Philippe Verdy (
Date: Wed May 28 2003 - 07:19:50 EDT

  • Next message: Marco Cimarosti: "RE: Not snazzy (was: New Unicode Savvy Logo)"

    From: "Pim Blokland" <>
    To: "Unicode mailing list" <>
    Sent: Wednesday, May 28, 2003 11:45 AM
    Subject: Re: Dutch IJ, again

    > Philippe Verdy schreef:
    > > i+j is a single combined Dutch ij character only if its not
    > followed by a vowel
    > This is not true; where did you get that idea?
    > It almost always IS a diphtong (cf words like bijen, vrijaf, zijig)
    > except where the i and the j happen to be in separate syllables
    > (bijou, bijectie).

    Do you mean that there is no possible inference rule ? I didnot want to be exaustive there, because your sample words where ij is a diphtong effectiely can be exceptions (or the two other words may be exceptions to the "normal" Dutch rules). I'm not a Dutch expert to be affirmative, I just wanted to give an idea with an example of such a possible rule.

    Well, it may appear that in general "ij" is always a single diphtong, unless there's an hypenation candidate between two syllables. In that case the problem becomes as complex as determining syllable breaks for hyphenation.

    For now there does not seem to exist a clear definition of what could be a good localized breaker for grapheme clusters, as it also implies an analysis of syllables in Dutch or other languages (for now, only abjads and Asian scripts seem to have a normalized algorithm for the determination of such grapheme clusters, and there remains a lot of work to do with alphabetized languages, which seem to use letters in a way much more complex than expected).

    Still I'm not convinced that the explicit "ij" diphtong is really different from an i + j pair for Dutch, which uses a lexical-based approach (so the combined character "ij" may just be there only for compatibility with some legacy usages, as most rendering of Dutch text does not allow a reader to make a difference between a combined ij cluster and separated i+j letters; the separation does not come from letters themselves but from the lexical knowledge of the reader).

    The special typographic case of inter-letter spacing for justification is not dramatic (because other typographic rules also require that no excessive spacing is used.) Exception to this case is the usage of artificially expanded text where the typographic effect is used as a way to emphasize a title or mark, and it is very near from a logographic design, where the form rather than the semantic is considered more important (is such usage still text ? Shouldn't this be excluded from Unicode standardization as it requires a necessary markup out of the scope of Unicode, to handle this case as a form of typographic *art* ?).

    It would be interesting to analyze the way UCA behaves for the collation of Dutch text...

    This archive was generated by hypermail 2.1.5 : Wed May 28 2003 - 08:02:05 EDT