Re: (base as a combing char)

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Nov 27 2004 - 15:12:55 CST

  • Next message: Philippe Verdy: "Re: Relationship between Unicode and 10646"

    From: "Addison Phillips [wM]" <aphillips@webmethods.com>

    > For example, Dutch sometimes treats the sequence "ij" as a single letter
    > (it turns out that there are characters for the letter 'ij' in Unicode
    > too, but they are for compatibility with an ancient non-Unicode character
    > set). Software must be modified or tailored to provide behavior consistent
    > with the specific language and context.

    Not sure about that: not all Dutch "ij" letter pairs are a single grapheme,
    so there are cases where the two letters must be treated as distinct and not
    as a single letter. For this reason, Dutch will need a distinct "ij" letter,
    coded as a single character, and with its own capitalization rules (the
    uppercase or titlecase form of "ij" will be the single letter "IJ", not two
    letters and not "Ij"; also there exists cases where diacritics can be added
    on top of the "ij" letter, which is then more tied as a single letter than a
    simple digraph.)

    This distinction is also often made visible in the typography (where the
    single letter "ij" digraph is shown with the leg of the "j" kerned deeply
    below (and sometimes to the left of) the leading "i", unlike cases where
    they are treated as two letters where no kerning occurs (the 'i' is shown
    completely on the left of the bottom-left leg of 'j'), and it is even more
    evident in the uppercase style (where there will even be the standard small
    distance between I and J glyphs when they are two distinct letters, but
    where the uppercase I may be drawn in the middle of the left leg of J).

    Note the very near ressemblance of the "ij" signel letter with a y with a
    diaeresis (so you'll find also Dutch texts that use y with diaeresis instead
    of the correct "ij" letter, notably in texts coded with legacy charsets).
    This distinction is also preserved for uppercase, where the missing "IJ"
    single letter appears encoded with Y with diaeresis...

    These cases in Dutch where there's a distinction between the single letter
    digraph and two letters are rare, so it is often acceptable to encode the
    digraph with two letters, without creating linguistic ambiguities (in most
    cases...), or with y with diaeresis/umlaut (which otherwise is not a letter
    used in Dutch).

    For me, your allusion to legacy charsets is about the deprecating use of y
    with diaeresis, not about the use of a distinct "IJ" letter which is needed
    for Dutch and should be treated as distinct from the "I then J" letters
    pair.



    This archive was generated by hypermail 2.1.5 : Sat Nov 27 2004 - 15:14:53 CST