Romanized Cyrillic bibliographic data--viable fonts?

From: J M Craig (jmcraig@xmission.com)
Date: Mon Aug 26 2002 - 09:27:14 EDT


Anyone at all familiar with bibliographical data (the MARC standards)
knows that they can be a real pain to deal with. In this case, the
difficulty isn't with the MARC data itself, but with the Library of
Congress's Romanization standards and the lack of support for combining
half marks in available fonts. I'm trying to help a client properly
display Romanized Cyrillic from MARC data on a Unicode-enabled
application. The ultimate problem is, I can't find an available font
that properly supports the combining half marks FE20 and FE21.

Alan Wood lists these two on his page of fonts by ranges (a truly
impressive collection of info, BTW, Mr. Wood):

Arial Unicode MS
   Apparently you can only get this with MS Office or Publisher these
days--not a good solution for my client since their budget's very
limited and they'd need it on a bunch of workstations. The most
important issue from a technical point of view is that the marks may not
properly combine and I don't have a copy of the font to test it myself.
Does anyone know if these marks will properly combine with T, t, S, s,
I, i, A, a, & U, u when using the MS font?

Naqsh
   A cursive font (not practical) and the marks don't appear to combine
properly in any case.

Any suggestions welcomed! Is there a tool out there that will allow you
to edit a font to add a couple of missing characters?

(A more extensive explanation of the problem follows for those who want
the gory details.)

John Craig
Alpha-G Consulting, LLC

Gory details:
The bibliographical data in question follows the Library of Congress
Romanization rules (see this link):

http://lcweb.loc.gov/catdir/cpso/romanization/russian.pdf

An effective conversion to Unicode for the specified Romanizations of
these Cyrillic characters is proving elusive:

/ts/
Unicode 0426 (capital) & 0446 (lower case)
/yu/
Unicode 042E & 044E
/ya/
Unicode 042F & 044F

The specified Romanization for each of these Cyrillic characters
includes a ligature over the top of the two Latin code points in
question (to indicate that the Latin characters represent a single
Cyrillic character presumably). Now, the proper Unicode sequence for
what the Library of Congress wants (based on their own documentation of
the correspondances between the MARC ANSEL character set and Unicode)
requires the use of the combining half marks left-half ligature U + FE20
and right-half ligature U + FE21:

/ts/
Unicode 0078 FE20 0077 FE21
<t> <left half ligature> <s> <right half ligature>
/yu/
Unicode 0069 FE20 0075 FE21
<i> <left half ligature> <u> <right half ligature>
/ya/
Unicode 0069 FE20 0061 FE21
<i> <left half ligature> <a> <right half ligature>

All very well, but the application can't paint it because of the lack of
the combining half marks in the available fonts.



This archive was generated by hypermail 2.1.2 : Mon Aug 26 2002 - 07:50:15 EDT