Re: Visarga, ardhavisarga and anusvara -- combining marks or not?

From: Asmus Freytag (
Date: Tue Aug 25 2009 - 17:00:44 CDT

  • Next message: Shriramana Sharma: "Document on usage of Reph in Gurmukhi and Telugu"

    On 8/25/2009 12:53 PM, Kenneth Whistler wrote:
    > Asmus said:
    >> The third approach would leave the actual assignments in
    >> place, but achieves the same effect by a highly visible effort to
    >> document the improved understanding of what it means
    >> for a character to have classification Mc.
    >> Unlike the first option, this would not be a case-by-case
    >> annotation of a few problematic characters in diverse
    >> script chapters, but would have to be more up-front.
    > And I would second this third approach. ;-)
    The success of this third approach depends on it being able to rattle
    people's naive understanding that equates all combining marks with
    graphically combining characters, and more specifically treating any of
    the gc=Mc characters as if they were non-spacing marks with glyphs of
    positive advance width.

    The current status, nibbling at the margins, has not been successful,
    otherwise there wouldn't be as many problems.

    In that context, getting information on *specific* characters is not
    contributing to the proposed solution, because the problem is generic in
    nature. It's just more "nibbling at the margins". (However useful it is
    otherwise in documenting the scripts specifics).

    What I was hoping for is that you go beyond seconding this in E-mail and
    continue to spearhead a revision of the text, going beyond the
    foundation you laid in section 3.6

    First, it would be useful to add to D55 the general category (gc=Mc).
    Second it would be useful to either mention the sub-types in comments
    (split, left side, generic spacing) or to define them at this point with
    actual definitions. Then you can put a note there, warning that spacing
    marks that aren't special (what I've called "generic") don't combine
    with the base character and should be rendered like ordinary spacing
    characters of that script.

    For example, section 4.1 discusses some sub-types of combining marks, a
    short discussion of both the generic non-spacing and the set of generic
    spacing combining marks and issues would be useful. I know that 4.1 came
    from the desire to address the normalization issues of combining class,
    but that's not apparent to the reader - it needs to cover all types of
    combining classes and be given cross links to all other descriptions of
    combining classes and how to handle them. [This is more important if TUS
    is forever online]

    Section 5.12 (which is about nonspacing marks) uses the terms combining
    mark and nonspacing mark interchangeably. At that point, a pointer to
    discussion of *other* types of combining marks, esp. the "spacing"
    versions is needed.

    Alternatively even a short section "strategies for handling other
    combining characters".

    Section 7.9 entitled "combining marks" could be more explicit in that
    the discussion is only (or primarily) for combining marks of types found
    in European alphabetic scripts, and be more forceful and up-front (that
    is in the opening section) in mentioning that while some aspects of
    combining marks are generic, the rendering rules for other scripts (and
    other types of combining marks) are different.

    And /or adding a short subsbusection that points to other types of
    combining marks (spacing, subjoint, etc, by their type and completes the
    cursory overview, so that the section can be read as an introduction to
    the topic). Mention of spacing combining characters is especially
    apropos in Section 7.9, because it talks about spacing clones of
    nonspacing marks, which embodies another use of the word "spacing".

    In chapter 9, I note the absence of any "generic" spacing character in
    the examples for the rendering rules (the one and only such character
    occurs in the example for the bindu, so its own rendering behavior isn't
    the one that's discussed).

    A new R rule should be added for "generic" spacing marks, that makes
    clear that these are laid out just like "Lo".

    Ditto for any comparable discussion of other scripts containing
    "generic" Mc characters. That's just for starters.

    After that is done, it would indeed be useful to document individual

    > It would be very useful to have a written explanation of
    > the behavior of visarga and ardhavisarga to help guide
    > rendering implementations. Note that there are many
    > many extensions for Vedic added in Unicode 5.2, and
    > the addition of the ardhavisarga is not the only character
    > which implementations will need new information about
    > in order to get best display behavior -- but it is
    > a good place to start.
    > Shriramana Sharma's discussion which started this thread,
    > shorn of assumptions about what "should" or "should not"
    > be a combining mark, and instead focussing on the actual
    > display behavior required, could seed such a written
    > explanation. It could start existence as a FAQ (or
    > set of FAQ entries) or a UTN -- and if it proves helpful,
    > then be reworked to incorporate it as appropriate in
    > the relevant sections of the standard, if the UTC approves
    > heading in that direction.
    > --Ken

    This archive was generated by hypermail 2.1.5 : Tue Aug 25 2009 - 17:03:02 CDT