Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

From: Philippe Verdy (
Date: Thu Jul 10 2003 - 13:39:32 EDT

  • Next message: Kenneth Whistler: "Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures"

    On Thursday, July 10, 2003 6:42 PM, Peter Kirk <> wrote:

    > Anyway, I understood from the recent discussion of Hebrew that it is
    > Unicode policy not to do anything which could theoretically invalidate
    > existing text even if it could be proved that no such text existed.

    Where does the fact of saying that a Grapheme Disjoiner can be used in Turkish to avoid that the f collapses the dot above a next lowercase i?

    This does not change anything: existing texts can still produce ligatures in a renderer, unless explicitly said to not do so with a Grapheme Disjoiner, or the renderer is specially tuned to support the Turkish/Azeri languages. Existing texts do not need to be reencoded, if they are already correctly labelled with their language.

    The absence of such language specifier will never forbid a renderer to choose a fi ligature if available, unless these renderers are made conforming by correctly interpreting the Grapheme Disjoiner to mean "break the grapheme cluster here, and display the previous character(s)", then the Grapheme Disjoiner can be rendered itself as a non-spacing empty glyph, then the rest of the string...

    I'm still convinced that a ligature is still possible for a turkish <f, dotted-i> sequence, using <f, i, dot-above>. The ligature would apply to the middle bar of the <f> joined with the top serif of the <i>, but the top-right loop of the f would simply be a small horital bar, disjoined from the dot still present on the i.

    The same ligature could be used for the encoded sequence <f, dotless-i>, so an actual font would render the glyphs for <f, i, dot-above> as a base ligature glyph for <f, dotless-i> (with a top horizontal bar for the <f> part), and add separately the <dot-above> glyph kerned into the existing <f-dotless-i> ligature.

    To force disable this last ligature, we would use the encoded sequence <f, GDJ, dot-less-i>

    According to unicode the sequence <i, dot-above> has always been valid, despite it apparently has the same dotted glyph for all languages. It differs only in the fact that the explicit <dot-above> removes the Soft_Dotted property of the previous <i> to make it dotless, followed by a forced diacritic.

    So the encoded sequence <i, dot-above> is now made "equivalent" (for rendering purpose) to <dotless-i, dot-above> (despite they are not canonically equivalent per UAX#15: NFC/D) and not "equivalent" to an isolated <i> (not followed above diacritics)...

    -- Philippe.

    This archive was generated by hypermail 2.1.5 : Thu Jul 10 2003 - 14:21:09 EDT