RE: Accented ij ligatures (was: Unicode Public Review Issues update)

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Tue Jul 01 2003 - 06:04:27 EDT

  • Next message: Philippe Verdy: "Re: Accented ij ligatures (was: Unicode Public Review Issues update)"

    > > I don't know of any instances where a ij digraph would keep the dots
    > > AND get additional accent marks, nor of any where the ij would
    > > appear with a dotless i and dotless j and a single dot above,
    > > centered between them. Can you give examples?
    >
    > No of course:

    So why do you care?

    > the only sequence I know is a dotless ij digraph with
    > a centered accute accent.

    Not heard of that before; but if so, that's fine, and makes the
    ij-ligature
    useful for one more thing than I listed!

    >I just wonder if this public review makes
    > things clear that the presence of an accute accent is supposed to
    > remove both dots.

    Yes, but there is no need to overemphasise it.

    > For now I have seen some fonts keeping
    > the two dots, when centering an additional accute accent.

    Undesired behaviour.

    > The text of this update should specify that for this pair, the
    > intended option is to remove both soft dots, if there are other
    > diacritics.

    There is no need to overemphasise it.

    > But if one wants to restore the preious visual behavior, even if it's
    > incorrect for languages using this digraph as a letter, what would be
    > the behavior of using the following sequence:
    > <ij, combining dot above, combining accute>
    > (i.e. should this display 1 or 2 dots?)

    One (for consistency with usual behaviour), but it does not really
    matter.

    > Should the previous incorrect rendering be approximated with:
    > <ij, combining diaeresis, combining accute>

    Yes, but it does not really matter, since this isn't used.

    > or
    > <ij, combining dot above, combining dot above, combining accute>

    That should stack the two dots above each other, as usual.
    (There is absolutely no need for special rules here; this is not
    at all comparable to Vietnamese.)

    > > So if you want two dots and an acute use ‹ij, U+0308, U+0301›: ij¨´
    > >
    > > Of course a given font’s diaeresis will often not line up with the
    > > stems of its ij, and a custom one should be used instead. Or
    > > features and/or ligs as appropriate to the font’ technology could
    > > just use the ‹ij› glyph w/ an extra acute. Either way it is a glyph
    > > issue rather than a character issue.
    >
    > Doesn't it create a new equivalence for the sequences
    > <ij, diaeresis> and <ij>

    No. Note that there is no canonical or compatibility equivalence
    between i and a dotless i with a combining dot above.

    > if neither of them are followed by another combining above diacritic ?
    > If we dont want such equivalences, the Unicode standard should
    > say then that it's illegal to use two consecutive identical combining

    No, that is and should remain "legal". (Some rendering systems seem
    to maintain that some combining sequences are "illegal" or "malformed"
    in some way; this is an error in those rendering systems, that notion is
    unsupported by Unicode.)

    > diacritics. Or simply forbid using <ij,diaeresis> alone (not followed
    > by another diacritic with CC=230).

    No, (as above). Note that it is perfectly legal to put a dot above an
    i in any case, or even above a dotless i. And there is no equivalence!
    (Other than that the resulting glyphs *ideally* look the same.)

    > Yes this is really tricky,

    No. It's a very simple proposal.

    > and academic,

    From your point of view: very! (And I don't know why you try to
    mess things up beyond reason.)

    > I admit. But what forbids
    > encoding two superposed arrows above any letter? Or encoding
    > a <ij,macron> (with the dots removed from ij) followed by diaeresis,

    Nothing, and nothing. (As it should be.)

    > which could have a mathematical meaning?

    Using the ij-ligature in math expressions would be very ill-advised
    (unless part of a Dutch word used in its entirety as a variable name).

                    /kent k

    (Side issue:

    > > > The proposal however is fine for the mathematical
    > variants of i and
    > > > j, (including the double struck italic, for unification reasons)
    > >
    > > I think so too (though I don't know what you mean by "unification"
    > > here).
    >
    > I am speaking about the few "holes" in the mathematics block, which
    > were unified with pre-existing characters in other blocks. So if the
    > update is accepted for the new mathematics block, it must be
    > accepted also for these characters not present in these holes but
    > unified with characters of previously encoded blocks.

    There are no double-struck italic letters in plane 1.

    end side issue)



    This archive was generated by hypermail 2.1.5 : Tue Jul 01 2003 - 06:59:21 EDT