Re: CGJ , RLM

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Nov 26 2004 - 13:13:46 CST

  • Next message: Mark Davis: "Re: CGJ , RLM"

    From: "Mark Davis" <mark.davis@jtcsv.com>
    >I want to correct some misperceptions about CGJ; it should not be used for
    > ligatures.

    True. CGJ is a combining character that extends the grapheme cluster started
    before it, but it does not imply any linking with the next grapheme cluster
    starting at a base character.

    So, even if one encodes, A+CGJ+E, there will still be two distinct grapheme
    clusters A+CGJ and E, and the exact role of the trailing CGJ in the A+CGJ is
    probably just a pollution, given that this CGJ has no influence on the
    collation order, so that the sequence A+CGJ+E will collate like A+E, and it
    does not influence the rendering as well.

    A "correct" ligaturing would be A+ZWJ+E, with the effect of creating three
    default grapheme clusters, that can be rendered as a single ligature, or as
    separate A and E glyphs if the ZWJ is ignored.

    For example, a ligaturing opportunity can be encoded explicitly in the
    French word "efficace":
    "ef"+ZWJ+"f"+ZWJ+"icace".

    Note however that the ZWJ prohibits breaking, despite in French there's a
    possible hyphenation at the first occurence, where it is also a syllable
    break, but not for the second occurence that occurs in the middle of the
    second syllable.

    I don't know how one can encode an explicit ligaturing opportunity, while
    also encoding the possibility of an hyphenation (where the sequence above
    would be rendered as if the first ZWJ had been replaced by an hyphen
    followed a newline.)

    To encode the hyphenation opportunity, normally I would use the SHY format
    control (soft hyphen):
    "ef"+SHY+"fi"+SHY+"ca"+SHY+"ce"

    If I want to encode explicit ligatures for the "ffi" cluster, if it is not
    hyphenated, I need to add ZWJ:
    "ef"+ZWJ+SHY+"f"+ZWJ+"i"+SHY+"ca"+SHY+"ce" (1)

    The problem is whever ZWJ will have the expected role of enabling a ligature
    if it is inserted between a letter and a SHY, instead of the two ligated
    glyphs. In any case, the ligature should not be rendered if hyphenation does
    occur, else the SHY should be ignored. So two rendering are to be generated
    depending on the presence or absence of the conditional syllable break:
    - syllable break occurs, render as: "ef-"+NL+"f"+ZWJ+"icace", i.e. with a
    ligature only for the "fi" pair, but not for the "ff" pair and not even for
    the generated "f"+hyphen...
    - syllable break does not occur, render as "ef"+ZWJ+"f"+ZWJ+"icace", i.e.
    with the 3-letter "ffi" ligature...

    I am not sure if the string coded as (1) above has the expected behavior,
    including for collation where it should still collate like the unmarked word
    "efficace"...



    This archive was generated by hypermail 2.1.5 : Fri Nov 26 2004 - 13:14:34 CST