RE: Orthographies using ZWNJ (was: Displaying control characters)

From: Philippe Verdy (
Date: Sun Jul 22 2007 - 13:47:18 CDT

  • Next message: John Hudson: "Re: [hebrew] Re: Karaite manuscript"

    Asmus Freytag wrote:
    > > If this is something else, which options do we have to explicitly mark
    > > syllable breaks without ligatures, with or without a visible hyphen?
    > >
    > > What will happen with joining scripts (i.e. Arabic, Devanagari...) or
    > > cursive styles of alphabetic scripts? Does a prohibition of ligature
    > also
    > > prohibit the usual joining?
    > >
    > If you had read the standard, before creating your own alternate
    > reality, you wouldn't need to ask that question. The role of ZWNJ in
    > joining is explicitly described.

    You don't need to rant about my reading of the standard. I have said in my
    message that ZWNJ was used to control ligation/joining during rendering. I
    spoke about something else.

    You affirm that Unicode does not encode syllable breaks but it's completely
    wrong. SHY is a perfect example of an explicit syllable break.

    I was speaking about the effect of combining or detaching the effect of
    syllable breaks and ligatures. My question is still not answered.

    What I have seen is that the presence of a word joiner really prevents a
    ligature, although it is not specified anywhere; and if it is used as an
    invisible syllable break (which will never be rendered as a hyphen if a line
    break occurs) for compound words that are normally not separated by space or
    hyphen, but that may still be split if needed on line boundaries, I think
    it's normal that it prevents the formation of a ligature.

    Now the question remains: what is the effective difference between WJ and
    ZWNJ? I can't see any, both on the morphological analysis side, and on the
    rendering side.

    If WJ is not expected to break a ligature, this should be specified so that
    ZWNJ will be used explicitly to control that (WJ will still be used to
    control word breaks, mostly in scripts that have no required word separation
    by spaces or other punctuation marks)

    I saw this concern when replying to the message sent by Karl Pentzlin
    speaking about the compound word "Schilfinsel" (i.e. "Schilf" + "Insel"
    without a "fi" ligature), that he wants to encode as "Schilf<ZWNJ>insel",
    where the absence of ligature is expected to really mark the internal
    syllable break.

    German compound words (in my opinion) contain mor than just a rendering hint
    (ZWNJ) and WJ is certainly more significant to say that. So there are two
    situations when an author is tuning the rendering of the text and uses a
    hyphenation algorithm to mark explicitly where syllable breaks will occur:

            (1) Either the syllable break is wanted and expected here, so he
    will insert a SHY between the two parts of the word; but SHY still does not
    prevent a ligature, so he will need BOTH ZWNJ (against the "fi" ligature)
    and SHY after it: the resulting string will be "Schilf<ZWNJ><SHY>insel";

            (2) Or the syllable break is not desired, and WJ will be used to say
    that explicitly (preventing an automated hyphenator to insert a line break
    here), but as WJ does not prevent the ligature (it is not specified, but
    this ligature avoidance is still occurring with most renderers), so he will
    need to encode BOTH ZWNJ (against the "fi" ligature) and WJ after it (to
    disable any hyphenating line break): the resulting string will be
    "Schilf<ZWNJ><WJ>insel", rather than just "Schilf<WJ>insel".

    I am not inventing things. This is a "grey area" where something is not
    clearly specified, and due to the current implementations, I still see no
    clear difference between the effects of ZWNJ and WJ and how to use them, and
    what they are effectively preventing or enforcing. If WJ should effectively
    prevent a ligature, then it should be specified (and using ZWNJ in the
    alternative (2) above will NEVER be needed)

    My message had NOTHING that would let someone think that it was a
    "recommendation" or interpretation. You should have read it as a QUESTION
    left to discussions.

    This archive was generated by hypermail 2.1.5 : Sun Jul 22 2007 - 13:49:49 CDT