Re: CGJ , RLM

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Nov 26 2004 - 14:09:56 CST

  • Next message: Doug Ewell: "Re: No Invisible Character - NBSP at the start of a word"

    The statements below are incorrect, but I don't have the time to correct
    them all.

    ‚ÄéMark

    ----- Original Message -----
    From: "Philippe Verdy" <verdy_p@wanadoo.fr>
    To: "Mark Davis" <mark.davis@jtcsv.com>
    Cc: <unicode@unicode.org>
    Sent: Friday, November 26, 2004 11:13
    Subject: Re: CGJ , RLM

    > From: "Mark Davis" <mark.davis@jtcsv.com>
    > >I want to correct some misperceptions about CGJ; it should not be used
    for
    > > ligatures.
    >
    > True. CGJ is a combining character that extends the grapheme cluster
    started
    > before it, but it does not imply any linking with the next grapheme
    cluster
    > starting at a base character.
    >
    > So, even if one encodes, A+CGJ+E, there will still be two distinct
    grapheme
    > clusters A+CGJ and E, and the exact role of the trailing CGJ in the A+CGJ
    is
    > probably just a pollution, given that this CGJ has no influence on the
    > collation order, so that the sequence A+CGJ+E will collate like A+E, and
    it
    > does not influence the rendering as well.
    >
    > A "correct" ligaturing would be A+ZWJ+E, with the effect of creating three
    > default grapheme clusters, that can be rendered as a single ligature, or
    as
    > separate A and E glyphs if the ZWJ is ignored.
    >
    > For example, a ligaturing opportunity can be encoded explicitly in the
    > French word "efficace":
    > "ef"+ZWJ+"f"+ZWJ+"icace".
    >
    > Note however that the ZWJ prohibits breaking, despite in French there's a
    > possible hyphenation at the first occurence, where it is also a syllable
    > break, but not for the second occurence that occurs in the middle of the
    > second syllable.
    >
    > I don't know how one can encode an explicit ligaturing opportunity, while
    > also encoding the possibility of an hyphenation (where the sequence above
    > would be rendered as if the first ZWJ had been replaced by an hyphen
    > followed a newline.)
    >
    > To encode the hyphenation opportunity, normally I would use the SHY format
    > control (soft hyphen):
    > "ef"+SHY+"fi"+SHY+"ca"+SHY+"ce"
    >
    > If I want to encode explicit ligatures for the "ffi" cluster, if it is not
    > hyphenated, I need to add ZWJ:
    > "ef"+ZWJ+SHY+"f"+ZWJ+"i"+SHY+"ca"+SHY+"ce" (1)
    >
    > The problem is whever ZWJ will have the expected role of enabling a
    ligature
    > if it is inserted between a letter and a SHY, instead of the two ligated
    > glyphs. In any case, the ligature should not be rendered if hyphenation
    does
    > occur, else the SHY should be ignored. So two rendering are to be
    generated
    > depending on the presence or absence of the conditional syllable break:
    > - syllable break occurs, render as: "ef-"+NL+"f"+ZWJ+"icace", i.e. with a
    > ligature only for the "fi" pair, but not for the "ff" pair and not even
    for
    > the generated "f"+hyphen...
    > - syllable break does not occur, render as "ef"+ZWJ+"f"+ZWJ+"icace", i.e.
    > with the 3-letter "ffi" ligature...
    >
    > I am not sure if the string coded as (1) above has the expected behavior,
    > including for collation where it should still collate like the unmarked
    word
    > "efficace"...
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Nov 26 2004 - 14:11:36 CST