Re: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

From: William Overington (
Date: Thu Sep 19 2002 - 13:48:54 EDT

  • Next message: Edward H Trager: "Re: about starting off"

    Kenneth Whistler wrote, as part of a longer response to my original posting.

    >William Overington asked:


    >> I wonder if consideration could please be given as to whether this matter
    >> should be left unregulated or whether some level of regulation should be
    >> used.

    >I think this should depend first on a determination of whether there
    >is a demonstrated need for an actual representation of these sequences --
    >which ought to be determined by the people responsible for the
    >data stores which might contain them, namely the online bibliographic

    [further remarks here snipped]

    Actually, "this matter" to which I was intending to refer was as follows,
    being more general than just the romanization of Cyrillic characters.


    It seems to me that this matter of sequences of combining characters being
    used to give glyphs where different meanings are needed other than just
    locally and that glyphs for such meanings are only correctly displayed if a
    particular rendering system or a particular font are used touches at the
    roots of the Unicode system.

    It seems to me that the glyphs for such sequences are being left as if they
    were a Private Use Area unregulated system. I recognize that fonts have
    glyph variations in that, say, an Arial letter b looks different to a
    Bookman Old Style letter b, yet in that case the meaning is the same.

    I wonder if consideration could please be given as to whether this matter
    should be left unregulated or whether some level of regulation should be

    end quote

    In another post in the same thread, Ken states as follows.


    But that wasn't my point. There is no particular evidence
    that the ALA-LC conventions with the dot above the graphic
    ligature ties is in widespread use for romanizations of these
    particular languages, that I can see. So the *urgency* of
    solving this problem isn't there, unless the LC/library/bibliographic
    community comes to the UTC and indicates that they have a data interchange
    problem with USMARC records using ANSEL that requires a clear
    representation solution in Unicode.

    end quote

    The problem of which I am seeking discussion please is as to whether, in the
    present state of the rules, there would be any need for any bibliographic
    community to approach the Unicode Consortium over such a matter, and, if it
    is the case that they would not need to do so, would it be better to seek to
    change the rules now.

    It is convenient to consider the situation in relation to the romanization
    of Cyrillic characters, yet similar considerations may well potentially also
    apply to topics such as the Byzantine legal texts. There may well be other
    topics to which similar considerations may apply.

    For example, please suppose that there were a committee called the
    Romanization of Cyrillic Committee. Suppose that that committee were to
    have various meetings and decide that for a ts romanization ligature that

    t U+FE20 s U+FE21

    suits them fine, and that for the ts with a dot above romanization ligature

    t U+FE20 s U+FE21 U+0307

    suits them fine and publishes a list of assignments and example glyphs. The
    glyph for the ts with a dot above ligature in that publication has the dot
    above the curved line, centred horizontally. It is only later that someone
    with expert knowledge of the Unicode standard sees the published list and
    notices that the glyph shown in the document is, in fact, not the way that
    the glyph should appear according to the Unicode standard. By this time,
    many copies of the document have been published and sent to libraries around
    the world! Databases having started to be converted to what that
    publication may well be calling "the new Unicode based system".

    This might sound impossible, yet what is the present alternative? There is
    no way to formally register such sequences with the Unicode Consortium!

    I suggest that it might be a good idea to have an infrastructure whereby the
    Unicode Consortium registers sequences of combining characters and example
    glyphs, categorized as to application.

    This would have potentially far reaching benefits.

    Suppose, for example, that such an infrastructure existed, and that there is
    a mathematician, M, and a font designer, F, who do not know each other.

    M is writing a research paper on a particular branch of mathematics, where
    one of the key reference papers was written by an author whose name is
    written in Cyrillic characters, yet which name also has a romanized version.
    M finds that that romanization needs a character to represent the ts
    romanization ligature. How can M, who is using a word processor to prepare
    the research paper, insert that character into the document, because M is
    keen to insert the ts ligature in a form compatible with the standard
    bibliographic method for romanization of Cyrillic names?

    Fortunately, M finds that the word processor has available various special
    characters and finds a ts ligature and inserts it in the document. Behind
    the scenes the wordprocessor software inserts the correct Unicode sequence
    for the ts ligature.

    The display is excellent. However, as well as the wordprocessor software
    having the capability to add the ts ligature sequence, the display is only
    possible because F had, when updating the design of the mainstream roman
    font R which F designed, included glyphs for various sequences of characters
    used for representing romanization of Cyrillic characters. F is pleased to
    have done that, so that text set in the R font will, if some end user
    chooses to include some romanization of Cyrillic characters in a document,
    have iu, IU, ts and TS ligatures (etc) all appear in an elegant form. F is
    pleased that the R font can be used by end users in so many different areas
    of application, because not only has F included sequences for romanization
    of Cyrillic ligatures, F has also included ligatures for Byzantine legal
    texts and for various other specialist application areas where a general
    purpose roman font, such as R, might well be used by some of the end user

    F has found this quite straightforward to do, as, although not an expert in
    the underlying theory of either the romanization of Cyrillic characters nor
    in the encoding of Byzantine legal codes, F has the advantage of simply
    monitoring the Unicode website and, whenever a new collection of sequences
    is published, deciding whether to include those sequences in the various
    fonts which F looks after.

    Actually, F has, thus far, included all of the published sequences in the R
    font. However, F has only included a few of the sequences in various other
    fonts. For example, for the sequences for Byzantine legal codes, F included
    special glyphs for each of the sequences in a decorative font based upon the
    handwriting of a Byzantine scribe.

    Stepping back outside the hypothesis, what we have now, even with the best
    quality advice, is no more than the equivalent of legal opinion on what a
    sequence means: registering sequences and their glyphs would be the
    equivalent of a ruling by a court of record.

    For the avoidance of doubt I am not suggesting that every possible sequence
    of characters be registered, I am simply suggesting that a registration
    procedure might well be helpful to the end user community, so that authors
    of documents, font designers and others would all be in step regarding which
    sequences to use for particular applications and regarding which sequences
    to use to consider including in fonts as sequences to produce a specific
    glyph rather than the rendering system needing to rely on default
    combinations of combining characters which might produce a poor typographic

    I feel that there is presently the opportunity for the Unicode Consortium to
    provide this facility to the end user community. If the matter of
    establishing the infrastructure is left for too long, perhaps until some
    specific criterion of practical need is met, then it may well be that there
    is typographic chaos in the matter and that the matter will never then be
    right due to various legacy systems by then being in use.

    So, I ask whether this matter could please be considered.

    William Overington

    19 September 2002

    This archive was generated by hypermail 2.1.5 : Thu Sep 19 2002 - 14:36:00 EDT