Re: Missing character: Combining Up Tack Above

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Mar 30 2007 - 15:52:55 CST

  • Next message: Philippe Verdy: "RE: Missing character: Combining Up Tack Above"

    Asmus wrote:

    > Just because printers in the past grabbed whatever combination worked,
    > is not a good guidance as to the suitability of using combinations.

    Of course. I was using an analysis of the typography to help
    determine how the forms in question were made, and in turn using
    that as another clue to what the concept of the mark was (i.e. a
    modification of a macron), to add to the information provided
    by the paradigmatic pattern of its use and the explicit annotation
    of the intent of the mark.

    > If
    > the underlying intent is to create a new 'entity' then the requirement
    > is to encode that entity.

    I gotta take issue with that. We can impute that the underlying
    intent was to creat a new entity (the modified macron to indicate
    a modified pronunciation of an English "long" vowel). But that
    doesn't lead directly to a requirement to encode that entity.

    One has to first pass the hurdle of determining whether the
    entity *deserves* encoding via encoded characters, or is
    better treated via markup of some sort, or represents a nonce
    usage that doesn't rise to the level of requiring international
    standardization.

    Even assuming consensus is reached that the entity *does* require
    a character representation (as this one probably does) and is
    important enough to bother with (as this one might well be),
    you then simply have a requirement for *representation* in
    terms of characters -- which doesn't force the conclusion that
    the entity itself must be encoded as a character.

    Obvious exceptions: the discovery of deliberate ligatures in
    texts. Those constitute textual entities and you may well
    need to represent them for digital text, but that doesn't
    require you to go directly to encoding the ligature as a
    character.

    > Some entity decompose, but our rules for
    > decompositions is not merely that a similar visual effect *can* be
    > produced, but that the elements of the decomposition, when combined,
    > correctly form the new entity.

    I agree with that. I'm not suggesting that we start treating
    the Unicode characters as a visual lego set.

    >
    > I will not quibble with Ken's analysis that the new entity is not an Up
    > Tack. An I will not quibble with the fact that the entity is a
    > modification of a MACRON. I'll take these as read, for the sake of the
    > following argument.

    O.k.

    > I remain very much unconvinced that the decoration
    > on that macron is correctly represented by a combining vertical line
    > above. I can see no convincing evidence in the discussion.

    Nor am I. What I am convinced of, instead, is that the modification
    of the macron is a vertical tick diacritic on the macron. This
    could be proven, I suspect, if anybody could turn up the
    presumably manuscript material from which the books in question
    were typeset. That is unlikely a century later, however, for
    somewhat obscure material like this.

    But in any case, the question devolves to determining whether it
    makes sense to posit that a graphological diacritic with a
    roughly apostrophic shape, applied to another diacritic, deserves
    treatment in the Unicode character encoding as a *character*
    itself, or whether, like descender diacritics on Cyrillic letters,
    the diacritic nature of the mark doesn't lend itself to
    separate character encoding -- leading to the conclusion that
    the diacritic modified base should simply be encoded separately
    as a unit.

    >
    > Because of that, I see as viable alternatives either, the encoding of a
    > character to correspond with the entity as a whole,

    Which is what I would be inclined to in this particular instance,
    as the easier option to implement and explain.

    > or the encoding of
    > the correct modification for the macron.

    The correct modification of the macron is a vertical tick added
    to the top of it.

    You don't get that for free, because there already is a combining
    diacritic vertical tick above, namely, U+030D.

    You either claim:

      A. That isn't it (your straw position here), so a separate
         mark needs to be encoded.
         
    or

      B. That is it.
      
    In case A, you end up introducing another problematical confusable
    issue. By claiming functional distinction for two marks that would
    be visually virtually indistinguishable, you end up with the same
    kinds of confusion that occurs anytime visually indistinguishable
    characters are claimed to be distinct: ordinary users will have
    trouble determining which to use when, and you will end up with
    data corruption as a result.

    In case B, you end up with the possibility (or likelihood) that
    presentation of marks in combination won't result in the exact
    shapes expected, and the need to specify rules for glyphic
    combination in particular contexts.

    Case A is more difficult to justify paradigmatically. You end
    up with a mark that looks like X but only occurs in context Y,
    and another mark that looks like X but only occurs in context Z,
    when contexts Y and Z don't overlap. In particular, you have
    a vertical tick that is applied to base vowels (U+030D) and
    another vertical tick that is applied to macrons (U+XXXX).

    Case B is more difficult to justify practically, because it
    potentially requires more font smarts for contextual shaping
    of a single character, rather than simply designing the
    new character (U+XXXX) to fit correctly on any given font's
    macron (U+0304), without requiring any other contextual
    shaping beyond that already perhaps required for the macron
    itself.

    > Overall, placement of multiple
    > combining marks strikes me as a fragile (except in the context of
    > strong, well supported language-based requirements such as for
    > Vietnamese, Polytonic Greek and ignoring scripts with so called 'complex
    > layout' such as Arabic and similar cases for the moment). Because of
    > this, I think that the best user experience might be generated by
    > encoding the entity as such.

    I agree with that assessment in the end.

    But I think the stronger precedent to look towards here is
    the handling of letter diacritics when the diacritic form
    itself is a modification of the letter itself (descenders,
    bars through, hooks, and so on), rather than being a
    free-floating diacritic above or below the entire letter
    form. The UTC precedent in such instances is to acknowledge
    that a diacritic modification is present, but to encode the
    entire modified letter as a unit.

    And the UTC has precedents in place for handling diacritic
    modification of marks themselves in an analogous way.
    The recently accepted Lithuanian tone marks, U+1DCB
    COMBINING BREVE-MACRON and U+1DCC COMBINING MACRON-BREVE
    are themselves obviously simply graphological combinations
    of two existing combining marks that are already encoded
    as characters. Yet they were separately encoded as
    unitary characters. Now in those cases, the combination
    of the macron and the breve were graphically side-by-side
    linking, for which encoding simply as sequences of the
    existing marks wouldn't make much sense. But in principle,
    other than the placement of the diacritic modification
    above, rather than side-to-side, the MACRON-TICK is
    not much different in the problem it presents for
    encoding.

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri Mar 30 2007 - 15:57:18 CST