Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

From: William Overington (WOverington@ngo.globalnet.co.uk)
Date: Wed Sep 18 2002 - 02:31:43 EDT


In the discussion about romanization of Cyrillic ligatures I asked how one
expresses in Unicode the ts ligature with a dot above.

Regarding Ken's response to the Byzantine legal codes matter, it would
appear possible that the way that the ts ligature with a dot above for
romanization of Cyrillic could be represented in Unicode would be by the
following sequence.

t U+FE20 s U+FE21 U+0307

The ordinary ts ligature for romanization of Cyrillic being expressed as
follows.

t U+FE20 s U+FE21

The second example is from the recent thread on Romanized Cyrillic
bibliographic data.

In the recent thread about Byzantine legal codes, the following sequences
were suggested.

U+0069 U+0313 U+0301

U+0055 U+0313

The second of the above requiring a rendering different from what direct
reading of the Unicode specification might suggest.

Ken's reply seems to suggest that display of such sequences would be
renderer dependent or font dependent.

It appears to me that the ts ligature with a dot above, and a similar ng
ligature with a dot above, are already needed for the Library of Congress
romanization of Cyrillic system.

The following directory contains a lot of pdf files.

http://lcweb.loc.gov/catdir/cpso/romanization

The ts ligature with a dot above can be found on page 2 of the nonslav.pdf
file. The ng ligature with a dot above can be found on page 13 of the same
file.

Capital letter versions of the two ligatures are needed as well.

The two sequences U+0069 U+0313 U+0301 and U+0055 U+0313 mentioned above,
and possibly others, will be needed for the Byzantine legal codes.

It seems to me that this matter of sequences of combining characters being
used to give glyphs where different meanings are needed other than just
locally and that glyphs for such meanings are only correctly displayed if a
particular rendering system or a particular font are used touches at the
roots of the Unicode system.

It seems to me that the glyphs for such sequences are being left as if they
were a Private Use Area unregulated system. I recognize that fonts have
glyph variations in that, say, an Arial letter b looks different to a
Bookman Old Style letter b, yet in that case the meaning is the same.

I wonder if consideration could please be given as to whether this matter
should be left unregulated or whether some level of regulation should be
used.

William Overington

18 September 2002



This archive was generated by hypermail 2.1.2 : Wed Sep 18 2002 - 03:21:04 EDT