Re: Not really about "ASCII and Unicode lifespan" anymore...

From: Alexej Kryukov (
Date: Mon Sep 05 2005 - 13:43:55 CDT

  • Next message: Patrick Andries: "For Western music notation lovers"

    > On 2005.05.20, 20:43, Alexander Kh. <> wrote:
    > > What I meant is for example, strange "semisoft sign" 048C which
    > > looks same as Yate 0462. Is that someone's joke?
    > The Standard read, in the notes are for U+048D (the lower case form
    > of U+048C) that it is (or was) used in (a?) Kildin Sami orthography.
    > How much of it do think it is a joke?

    I know almost nothing about Kildin Sami, but I can add here that in TeX
    Cyrillic codepages yat' and semisoft sign are also treated as separate
    characters. Their difference is especially clear in italic fonts, where
    semisoft sign looks like a slanted version of the upright character,
    while yat' has a quite different shape, similar to a combination of
    Cyrillic italic "pe" (or Latin "n") and Cyrillic soft sign U+044C.

    > > Or, for example letter 047C - Omega with titlo: why separate letter
    > > if titlo is defined separately

    This is a really interesting question. If "Omega with titlo" is meant to
    represent just Omega with titlo, then this character is useless, because
    there are more commonly used combinations with titlo, which, however,
    aren't encoded is Unicode as precomposed characters.

    However, I have an impression that this codepoint was introduced as a
    representation of a quite different character: so-called "beautiful
    omega". Originally "beautiful omega" was just a combination of omega
    with smooth breathing and circumflex accent, which, however,
    have evolutionated to a specific ornamental design (hence the name),
    so that in printed Church Slavonic books it is even difficult to
    identify the initial components. In most editions the diacritical mark
    above "beautiful omega" (sometimes called "great apostroph") really
    looks similar to some versions of titlo, which may explain including the
    character into Unicode under its name.

    I don't know if my version is correct, but it seems to be the only
    possible logical explanation of including this character into Unicode.
    Note that some font developers already identify U+047C/U+047D with
    "beautiful omega", and design these glyphs correspondingly.

    > Perhaps because it was included in a pre-1990 standard (GOST?) -- at
    > least that is the reason for inclusion of hundered of pre-composed
    > characters?

    This guess is surely wrong: none officially improved codepage
    ever included any historical Cyrillic glyphs, nor GOST did. There
    are some semi-official standards, supported by communities of their
    users, like T2D in TeX or so-called Unified Church Slavonic
    (see -- sorry, in Russian only), but
    I am afraid Unicode people even never studied these standards.

    > But if you ask why U+047C isn't canonically decomposable
    > as U+0460 U+0483 (ditto for lower case) -- well I also would like to
    > know the reason.

    If my guess (explained above) is true, than "beautiful omega" should
    be treated as a character by itself, and not decomposed to anything

    > > Next letter, however, 047E is indeed a separate letter, and it is
    > > read as "ot" in text.
    > I dount that a cyrillic counterpart of U+036D would be productive...

    In fact, U+047E (Cyrillic "OT") is *not* a letter -- it is just a
    commonly used ligature, consisting of omega and superscript Cyrillic
    "te". But, anyway, this character has a clear meaning, and there is
    no doubt it was encoded correctly.

    However, a superscript "te" by itself (yes, it is a Cyrillic counterpart
    of U+036D) is really needed for representing some versions of Old
    Slavonic and especially Church Slavonic, as well as other similar
    superscript letters (so-called letter-titlos). I tried to raise this
    question recently in this mailing list. If this character ever gets
    encoded, then it would be correct to use it in the decomposition of

    > > This is all while numeric titlo is missing alltogether: how would
    > > you write Old Slavonic numerals in Unicode?

    Well, there are many problems with historical Cyrillic in Unicode, but
    I don't see any special problem *here*. All manuals of Old Slavonic and
    Church Slavonic state that the same titlo is used both as a contraction
    mark and as a numeral sign. So there are absolutely no reasons to
    introduce duplicates for the existing character U+0483.

    Alexej Kryukov <akrioukov at newmail dot ru>
    Moscow State University
    Historical Faculty

    This archive was generated by hypermail 2.1.5 : Mon Sep 05 2005 - 13:47:08 CDT