Re: Sequences of combining characters (from Romanization of Cyrillic andByzantine legal codes)

From: William Overington (
Date: Wed Sep 25 2002 - 05:01:28 EDT

  • Next message: William Overington: "Keys. (derives from Re: Sequences of combining characters.)"

    Peter Constable wrote as follows, responding to Kenneth Whistler.

    >I'm saying that *if* there is a need for digitial data representation of
    >the things in the ALA-LC transliteration (which, like you, I consider not
    >to have yet been demonstrated), then I wouldn't want to suggest it can be
    >represented as the sequence
    >> ><U+0074, U+0361, U+0073, U+0307>
    >since that has an existing, distinct presentation specified by the
    >Standard, viz.
    >> >{t-s-dot-tie-ligature} glyph
    >and it could create problems to have two distinct text forms having the
    >same encoded representation.

    I am thinking that if the bibliographic transliteration standards people do
    wish to continue the use of the ts ligature with dot above and the ng
    ligature with dot above, then it may well be that a new combining character
    Suppose that that character were encoded at U+03XY for some specific value
    XY then the following sequence would do the job for the ts ligature with dot
    above nicely.

    U+0074 U+03XY U+0073

    I do feel that once the bibliographic standards people have decided which
    transliteration characters they need that it would be a good idea for the
    Unicode Consortium to publish a guide as to which sequences are needed in
    practice as practical guidance to people authoring fonts so that the dozen
    or so required sequences may, if the author of any font so chooses, be
    included in various font tables so as to produce elegant display glyphs in
    printed work.

    This is perhaps not so much of an edge case of an edge case as may be
    thought by some people.

    Consider that a printing house typesets and prints books on various topics
    for various publishers, under paid contract.

    Yes, most of the books do not involve transliteration in bibliographic form
    of names originally written in Cyrillic.

    Yet over a number of years with many books passing through the printing
    house, one or two of them may contain such transliterations, perhaps a book
    on mathematical functions or physics or something like that with a
    bibliographic index, and so what may seem like an edge case of an edge case
    in this discussion could easily become a problem which needs to be solvable
    with a mainstream font. Perhaps a Murphy's law style event will occur in
    that the name of some famous mathematician will be found to have one of the
    rarer ligatures in it, though perhaps Murphy's law will act so that because
    I have suggested the possibility an exhaustive search of all known authors
    will not produce one such example! It is what I term a software unicorn.
    It may not appear to be necessary to fix it now and fixing it will take an
    effort, yet if it is not fixed then the problem could become large if a
    typesetter cannot set to copy an index of a book where the content of that
    index has been produced in total accordance with a Library of Congress
    bibliographic standards document.

    It may perhaps be that the dot above the ligature was not any indication of
    something to do with the phonetics of the word but was originally just a
    distinguishing mark made with a pen on a typewritten cardboard record card
    on an ad hoc basis when the matter first arose. I have no knowledge of
    phonetics in this manner but note that U+0307 does have a use as a
    derivative (Newtonian notation) which is something with which I am familiar
    as I used to be involved with analogue and hybrid computing, though it was
    not a notation which I used myself. Perhaps the dot above the ts ligature
    was used just as a differencing mark along the lines of the way that in
    heraldry a coat of arms of a second son or a younger brother might be

    While on the topic, how would the following sequence be displayed please?

    U+0074 U+0361 U+0073 ZWJ U+0307

    I am not suggesting this for bibliographic work, just wondering: for the
    bibliographic work I feel that a new character of a COMBINING DOUBLE
    INVERTED BREVE WITH DOT ABOVE might be a good solution.

    William Overington

    25 September 2002

    This archive was generated by hypermail 2.1.5 : Wed Sep 25 2002 - 07:49:08 EDT