Re: Errors in TUS Figure 15.2?

From: Peter Kirk (
Date: Mon Aug 02 2004 - 11:14:40 CDT

  • Next message: Jörg Knappen: "Warenzeichen"

    On 02/08/2004 13:12, Antoine Leca wrote:

    > ...
    >However, if I can agree with you about the area being fuzzy when it comes to
    >*ZWJ* and its numerous uses and some abuses (like Devanagari half-forms),
    >the verdict is not anywhere as bad about ZWNJ.
    >Behaviour of ZWNJ is consistent in about any place, and the correct
    >explanation is the one that is, among others, in chapter 15, that is that
    >ZWNJ restricts rendering to unconnected and unligatured forms (or prevent
    >use of any connected form or ligature, if you prefer), where possible.
    I agree that the situation with ZWJ is more complex than that with ZWNJ.
    But there is still uncertainty concerning ZWNJ because of the
    uncertainty about what is actually considered a "ligature", and so what
    exactly may be broken by ZWNJ.

    In discussions on my Holam proposal, John Hudson wrote:

    > [Note that on Unicode lists I tend to use the term ligature in a
    > purely technical sense: a single glyph representing two or more
    > characters. This says nothing about the form of that glyph. Discussion
    > of ligatures in complex scripts can become confusing unless this
    > strictly technical definition is kept in mind. It helps to remember
    > that when you are looking at rendered text, what *looks* like a
    > ligature -- i.e. two or more conjoined forms -- may or may not in fact
    > be a single glyph.]

    He clarified later that it is irrelevant to him whether a glyph consists
    of a single continuous block or is graphically equivalent to a base
    character plus a diacritical mark; all that matters is how it is

    But other experts use a very different definition of "ligature" which is
    apparently restricted to glyphs with a particular *form*, perhaps John
    Hudson's "two or more conjoined forms". This definition apparently
    excludes combinations of a base character with a diacritical mark, even
    when these are represented as two Unicode characters (i.e. not
    precomposed) but are implemented with a single glyph e.g. by
    substitution of a presentation form. On this latter definition, the
    glyph for the alphabetic presentation form U+FB4B HEBREW LETTER VAV WITH
    HOLAM cannot be considered a ligature, even though it is used, and is
    automatically substituted by rendering engines e.g. Uniscribe, only (in
    all normalisation forms) to represent the combination of two characters
    <VAV, HOLAM>.

    The situation is even more confused in that some Unicode characters,
    e.g. U+0152 LATIN CAPITAL LIGATURE OE, are called LIGATUREs in their
    character names but are unambiguously single Unicode characters (e.g.
    they have no decomposition even for compatibility). (These are in
    addition to the characters named LIGATURE in the Alphabetic Presentation
    Forms block, which mostly have compatibility decompositions.)

    The Unicode definition in the TUS glossary
    ( seems ambiguous.
    Here it is:

    > Ligature. A glyph representing a combination of two or more
    > characters. In the Latin script,
    > there are only a few in modern use, such as the ligatures between f
    > and i (= fi) or f
    > and l (= fl). Other scripts make use of many ligatures, depending on
    > the font and style.

    The first sentence would seem to confirm John Hudson's definition, for a
    "glyph" is defined in terms of rendering engine implementation rather
    than graphical identity or continuity. But the comment that there are
    only a few ligatures in modern use in Latin script seems to restrict the
    concept to certain graphical forms without making a proper definition.

    So the uncertain point is, what exactly are the "ligatures" whose
    formation ZWNJ should inhibit? Are they the technical ligatures as
    understood by John Hudson, or are they the undefined formal ligatures or
    conjoined forms?

    Which brings me back to the specific debate over the Holam proposals: Is
    it a proper use of ZWNJ to block the mapping of the character sequence
    <VAV, HOLAM> on to the glyph for the alphabetic presentation form U+FB4B
    HEBREW LETTER VAV WITH HOLAM, so that the HOLAM dot is positioned in its
    regular top left position relative to the base character, rather than
    the irregular (top centre or top right) place in the alphabetic
    presentation form?

    >>Another argument against our proposal is that by defining
    >>ZWNJ as breaking a ligature I am specifying implementation.
    >This is a dubious argument. Unicode specifies encodings. When two different
    >"meanings" are identified, different encodings are requested, so it is a
    >task for Unicode.
    >OTOH, if there is no underlying difference and the matter is purely of
    >presentation (like the aspect of a, like a reversed e or like a o with left
    >stem), then Unicode is not to be involved.
    >I know the border is fuzzy. ;-) or :-(.
    >Here, the fact it ligates or no does mean something (and this is the hard
    >part of the demonstration) is what should be examined. How it is implemented
    >is largely irrelevant (in fact, it is relevant when the result is *not*

    There is a separate issue of whether it is proper to use ZWNJ or ZWJ for
    a semantically significant distinction. It is arguable whether the Holam
    distinction is actually semantic, although it does need to be made in
    plain text for proper exact typography. But then there are other
    distinctions made by ZWNJ e.g. in Persian which are certainly
    semantically significant.

    My proposal was criticised at one point for restricting how something
    could be implemented. I had demonstrated that there was one feasible
    implementation strategy, that it is *not* something *not* implementable.
    Is it really necessary to demonstrate that there is more than one
    feasible strategy so that implementers have a choice? In any case, the
    restriction to one strategy was not imposed by the proposal or by TUS,
    but by the rendering system (OpenType) and particular implementations of
    it, which had the effect of restricting the font implementer's options.

    >OTOH, regarding your problem, I should point out that the Bengali's
    >precedent is anything but something that should be taken as example: it
    >appears to me as an ad-hoc solution built in a hurry, that happened to fit
    >well with certain technical implementations; it is a nightmare to handle for
    >others; and now there is on the table a proposal, PR-37, which among other
    >things will (try to) remove this hack and replace it with another, more
    >orthogonal (using ZWJ).
    Thanks for your advice about PR-37. I realised after including this
    example in the draft Holam proposal that it is in fact controversial.
    However, it seems that the controversy is over whether to use ZWJ or
    ZWNJ; the principle seems to be accepted that one or other may be used,
    and in this position between a base character and a combining mark. The
    UTC obviously needs to decide this issue once and for all, and then
    implementers will need to adjust their implementations to fit. Any
    adjustments are likely to make things easier also for implementation of
    my Holam proposal.

    No one, as far as I know, has proposed a resolution of the Bengali
    ligature issue by defining a new Unicode character. Why not? Presumably
    because this would be a breach of the character/glyph model. Very
    similar principles apply to the Holam case. Use of ZWNJ has been
    proposed because it seems to fit Unicode definitions better. But I would
    not object if the UTC preferred a representation with ZWJ for continued
    compatibility with the Bengali case, especially if this solves actual
    implementation difficulties. My objection to a new character solution is
    basically that it breaks the character/glyph model by defining a new
    character for what is no more than a glyph variant.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Mon Aug 02 2004 - 11:16:08 CDT