Re: Not really about "ASCII and Unicode lifespan" anymore...

From: Alexej Kryukov (akrioukov@newmail.ru)
Date: Mon Sep 05 2005 - 13:43:55 CDT

Next message: Patrick Andries: "For Western music notation lovers"

Previous message: Mark Davis: "Re: Definition of case folding - where?"
In reply to: Anto'nio Martins-Tuva'lkin: "Not really about "ASCII and Unicode lifespan" anymore..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> On 2005.05.20, 20:43, Alexander Kh. <alexkh@writeme.com> wrote:
> > What I meant is for example, strange "semisoft sign" 048C which
> > looks same as Yate 0462. Is that someone's joke?
>
> The Standard read, in the notes are for U+048D (the lower case form
> of U+048C) that it is (or was) used in (a?) Kildin Sami orthography.
> How much of it do think it is a joke?

I know almost nothing about Kildin Sami, but I can add here that in TeX
Cyrillic codepages yat' and semisoft sign are also treated as separate
characters. Their difference is especially clear in italic fonts, where
semisoft sign looks like a slanted version of the upright character,
while yat' has a quite different shape, similar to a combination of
Cyrillic italic "pe" (or Latin "n") and Cyrillic soft sign U+044C.

> > Or, for example letter 047C - Omega with titlo: why separate letter
> > if titlo is defined separately

This is a really interesting question. If "Omega with titlo" is meant to
represent just Omega with titlo, then this character is useless, because
there are more commonly used combinations with titlo, which, however,
aren't encoded is Unicode as precomposed characters.

However, I have an impression that this codepoint was introduced as a
representation of a quite different character: so-called "beautiful
omega". Originally "beautiful omega" was just a combination of omega
with smooth breathing and circumflex accent, which, however,
have evolutionated to a specific ornamental design (hence the name),
so that in printed Church Slavonic books it is even difficult to
identify the initial components. In most editions the diacritical mark
above "beautiful omega" (sometimes called "great apostroph") really
looks similar to some versions of titlo, which may explain including the
character into Unicode under its name.

I don't know if my version is correct, but it seems to be the only
possible logical explanation of including this character into Unicode.
Note that some font developers already identify U+047C/U+047D with
"beautiful omega", and design these glyphs correspondingly.

> Perhaps because it was included in a pre-1990 standard (GOST?) -- at
> least that is the reason for inclusion of hundered of pre-composed
> characters?

This guess is surely wrong: none officially improved codepage
ever included any historical Cyrillic glyphs, nor GOST did. There
are some semi-official standards, supported by communities of their
users, like T2D in TeX or so-called Unified Church Slavonic
(see http://irmologion.ru/ucsenc.html -- sorry, in Russian only), but
I am afraid Unicode people even never studied these standards.

> But if you ask why U+047C isn't canonically decomposable
> as U+0460 U+0483 (ditto for lower case) -- well I also would like to
> know the reason.

If my guess (explained above) is true, than "beautiful omega" should
be treated as a character by itself, and not decomposed to anything
else.

> > Next letter, however, 047E is indeed a separate letter, and it is
> > read as "ot" in text.
>
> I dount that a cyrillic counterpart of U+036D would be productive...

In fact, U+047E (Cyrillic "OT") is *not* a letter -- it is just a
commonly used ligature, consisting of omega and superscript Cyrillic
"te". But, anyway, this character has a clear meaning, and there is
no doubt it was encoded correctly.

However, a superscript "te" by itself (yes, it is a Cyrillic counterpart
of U+036D) is really needed for representing some versions of Old
Slavonic and especially Church Slavonic, as well as other similar
superscript letters (so-called letter-titlos). I tried to raise this
question recently in this mailing list. If this character ever gets
encoded, then it would be correct to use it in the decomposition of
U+047E/U+047F.

> > This is all while numeric titlo is missing alltogether: how would
> > you write Old Slavonic numerals in Unicode?

Well, there are many problems with historical Cyrillic in Unicode, but
I don't see any special problem *here*. All manuals of Old Slavonic and
Church Slavonic state that the same titlo is used both as a contraction
mark and as a numeral sign. So there are absolutely no reasons to
introduce duplicates for the existing character U+0483.

-- 
Regards,
Alexej Kryukov <akrioukov at newmail dot ru>
Moscow State University
Historical Faculty

Next message: Patrick Andries: "For Western music notation lovers"
Previous message: Mark Davis: "Re: Definition of case folding - where?"
In reply to: Anto'nio Martins-Tuva'lkin: "Not really about "ASCII and Unicode lifespan" anymore..."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Sep 05 2005 - 13:47:08 CDT