RE: Sequences of combining characters (from Romanization of Cyril lic and Byzantine legal codes)

From: Marco Cimarosti (
Date: Wed Sep 18 2002 - 04:48:41 EDT

William Overington wrote:
> Regarding Ken's response to the Byzantine legal codes matter, it would
> appear possible that the way that the ts ligature with a dot above for
> romanization of Cyrillic could be represented in Unicode
> would be by the following sequence.
> t U+FE20 s U+FE21 U+0307

I think that <U+0307> would only apply to <s U+FE21>, not to the whole
sequence <t U+FE20 s U+FE21>.

Using the COMBINING DOUBLE INVERTED BREVE doesn't make things much better:

        t U+0361 s U+0307

Still, <U+0361> only applies to <t>, and <U+0307> only applies to <s>.

Perhaps, a viable approach could be using the COMBINING GRAPHEME JOINER (to
turn <ts> into a single 'grapheme'), and then use regular combining marks
(as opposed to the "double" clones):

        t U+034F s U+0311 U+0307

> In the recent thread about Byzantine legal codes, the
> following sequences were suggested.
> U+0069 U+0313 U+0301
> U+0055 U+0313
> The second of the above requiring a rendering different from
> what direct reading of the Unicode specification might suggest.

I don't think Unicode really 'specifies' this: it is a glyph issue, and the
details of it are left to the typographers.

The only thing I found about is on page 27 of the Unicode standard, which
simply states that this behavior may exist in some scripts, and gives an
example with polytonic Greek:

        "Prominent characters that show such override behavior are
associated with specific scripts or alphabets. For example, when used with
the Greek script, the "breathing marks" U+0313 COMBINING COMMA ABOVE (psili)
and U+0314 COMBINING REVERSED COMMA ABOVE (dasia) require that, when used
together with a following acute or grave accent, they be rendered
side-by-side above their base letter rather than the accent marks being
stacked above the breathing marks."

But that passage is not really a "specification", so I don't think it needs
a "correction".

Of course, it could be enhanced by changing "specific scripts or alphabets"
into "specific scripts, alphabets, or languages". And Nick Nicholas's
cross-alphabet example ("Ulpianus") could be inserted as well.

> I wonder if consideration could please be given as to whether
> this matter should be left unregulated or whether some level
> of regulation should be used.

It seems to me that the well-known motto "Unicode encodes characters, not
glyphs" implicitly includes an answer to this: as far as possible, the
matter should be left unregulated.

_ Marco

