RE: Unicode Cyrillic GHE DE PE TE in Serbian

From: Janko Stamenovic (janko@teletrader.com)
Date: Tue Jan 18 2000 - 11:47:37 EST


> Having two different encodings for the *same* letters is not at
> all an easy
> solution. It is the most complicated thing I could think of!

They are not going to be the same from now on and that's it! There is no
such thing as the compatibility problem.

> Imagine searching, for instance: all applications should be
> changed to know
> that "Russian pe" and "Serbian pe" are to be considered the same,
> otherwise
> Serbians would always have problems when searching Russian documents, and
> vice versa.

No they would not have any problems. When they search for Russian word
through Russian document, they would switch "Input locale" to Russian and
type the Russian word.

There is not such thing as "searching for the word not knowing if it is
Russian or Serbian". Russian and Serbian are the different languages and
already have different set of characters (we'd say they have different
Alphabets in pre-computer times)

> Imagine case-conversions: Cyrillic uppercase PE would correspond to two
> separate lowercase letters.

That's a good argument. But if we do add also uppercase letters we'd keep
1-1 conversions. But this is not a real request to be kept by Unicode
standard. Read on.

> When you uppercase text there is no
> problem; but
> when you *lowercase* it, the software needs to know which
> lowercase to use.

I see this as far the smaller problem than what we have now. Unicode as the
standard even does not guaratee that when you do conversion to one side and
back nothing will change!

I think most of the people would live with that much easier than looking at
"m" where "t" is in every printed document.

> Let's also consider display, that is Janko's main concern. With
> the current
> solution, we have a problem: if Serbian text is displayed with a font
> designed for Russian, some *italic* letters look very strange (and are
> possibly unreadable for those who don't have familiarity with
> Russian). With
> Janko's solution, we have a much bigger problem: if Serbian text is
> displayed with a font designed for Russian, some letters (italic or not)
> will simply display as black boxes.

Serbian already has a number of letters which do not exist in Russian, so
the only question is if somebody supports Cyrillic or not. Whoever would
support Cyrillic in Unicode would properly handle all Cyrillic letters
(Serbian, Russian, Ukrainian) which are different for different living
Cyrillic languages.

> Wrong: Cyrillic it is one of the 3 "jolly" alphabets (together with Latin
> and Arabic) and it is used to write *hundreds* of languages in the former
> USSR and elsewhere. Let's not be Eurocentric considering only superstar
> languages like Russian, Serbocroatian, Bulgarian, etc.: there are a great
> number of "minority" languages in Asia that are written in Cyrillic too.

I agree, did you know that Mongolian is written in Cyrillic. But did Unicode
get any complaints for any other Cyrillic language that it was not properly
handled?

> >True, but still this is very straightforward solution for the named
> problem,
> >since it does not require any new software concepts like "rendering based
> on
> >language tagging" which requires a lot of changes in many levels of the
> >software: since I'm Windows programmer, I know that such a request would
> >mean that API would have to be extended to pass the *language*
> information
> >to the font creation engine. This means that Microsoft would
> have to change
> >the API (very much!), MFC and all their applications etc. and
> all this just
> >to make possible to display five Serbian letters properly? I don't expect
> >this even in next ten years!
>
> I do not expect this in the next 1000 years. Language tagging or other
> similar complicated things will probably always be mostly for fine word
> processing.

What do you consider than "fine word processing"? In the world in which
everybody has (and uses) 600 dpi laser printer, and where my ordinary
1024x768 display engine shows real CURSIVE forms for Latin languages asking
for "decent" (not perfect) rendering of Serbian and Russian is not such a
big request.

> For many other easier things, it is enough to use one of the two simpler
> solutions that have already been mentioned:
>
> 1) Use language-specific fonts, tailored for Serbian *xor* Russian;

And say goodbye for displaying both Serbian and Russian on the Internet
properly?

> 2) Use compromise fonts for Serbian *and* Russian (using "sloped" or no
> italics).

Can anything be more ugly than this?

> >And even so, information "China or Japan" can be squeezed in "charset"
> >field -- but there is not space to squeeze "Serbian" or "Russian"
> >in it.
> >Contrary to that, having Serbian and Russian text in the same document is
> >quite a small goal which should be handled gracefully.
>
> Janko, I don't follow your reasoning here. I thought we were talking about
> one single charset: Unicode.

I am talking about the ways to push the language information to the
rendering engine -- API must be changed for it, and wehn we speak about
Chinese OR Japanese we can consider two of them "different charsets" since
they have both very bign number of characters. Contrary to this,
inctroducing "Serbian" charset between dozen used up to now seems quite
strange.

> >It is far from "professional" typesetting engine, but even such
> applications
> >should offer something decent. And anybody from the people who
> >here "do fonts for living" would tell you that using "sloped" letterforms
> >for Times or any Serif is more than unacceptable.
>
> But I would like someone of them to admit that "sloped" letterforms for
> Helvetica or most sans-serifs fonts *is* more than unacceptable.

This would nobody say because that's the way how Latin letters also look in
the same case. If you'd give up cursive shapes of Latin letters in Times for
your language, only then I'm prepared to do the same.

> What is not acceptable, IMHO, is the inconsistent choices that
> you can see,
> e.g., in MS Arial (not Arial Unicode: the older one I mean). The
> italics for
> Cyrillic letters that look like Latin letters (e.g. a) are simply
> "sloped",
> but the italics for letters that do *not* look like Latin letters
> (e.g. pe)
> mimic the Russian italics that normally belong to serifs fonts.

If Times for Latin letters has real cursive, it is really not a big request
that Cyrillic letters look the same. More than that -- it is going to be
very ugly. Cyrillic and Latin should very ofter exist in the same document.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT