RE: accented cyrillic characters

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Feb 24 2003 - 07:57:57 EST

  • Next message: Michael Everson: "RE: guarani sign"

    Barnie De Los Angeles wrote:
    > Even after studying the Unicode web site for a while I am not able to
    > find a solution for this issue.
    >
    > The task is to include accented cyrillic characters (vowels
    > only) into
    > russian html. (Vowels are accented or "stressmarked" in Russian for
    > educational purpose.)
    >
    > My html pages are always utf-8 encoded.
    >
    > "Pre-accented" Russian vowels obviously do not exist as Unicode
    > characters of their own.
    >
    > I only need one kind of accent. Its Unicode number is
    > probably 0301 and it is called "accent" or sometimes
    > "stressmark".
    >
    > The remaining question is how to "combine" this accent with a
    > vowel, or: how to get that dammed stress mark 0301 on top of a
    > character?

    You simply put the accent character *after* the letter character. Either
    character can be encoded directly (e.g. in UTF-8) or with a numerical
    reference:

            Ð°Ì
            Ð°́
            а́
            а́

    The fourth notation should work independently of the page encoding, while
    the other three require a charset declaration by your server, or inserted in
    the <head>...</head> section of your file:

            <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    The visual result depends on the font installed on the computer of people
    reading your page. The typical results are:

            1. Two rectangles: no font supports Cyrillic or combining marks;

            2. A rectangle with an accent on top of it: the font supports
    combining marks but no Cyrillic;

            3. A Cyrillic "a" followed by a rectangle: the font supports
    Cyrillic but no combining marks;

            4. A Cyrillic "a" with an accent too high on top of it: the font
    supports Cyrillic and combining marks, but it is not a "smart font" (the
    accent is so high in order to also fit on a capital letter);

            5. A Cyrillic "a" with an accent on top of it, placed at a correct
    height: the font supports Cyrillic, combining marks, and it is a "smart
    font".

    As an author, what you can do to try and force result 5 (or 4, at least) is:

            - Specifying that that piece of text should use one of commonly
    available fonts that fit your needs, in order of preference. You can do this
    with a Cascading Style Sheet or with the <font> tag. E.g.:
            <font face="Code2000, Arial Unicode MS, Arial, Times New
    Roman">&#x430;&#x301;</font>
            To do this, you must make some assumption about the kind of
    operating system(s) used by your users, and know which fonts are commonly
    available on those computers.

            - Adding a link to a help page (written in English and/or with
    Russian text included as a picture) which explains to users how they can set
    up their computers to have the proper font support.

            - Doing nothing. You have done your part, encoding the page
    correctly, so let the users do their homework too.

    _ Marco



    This archive was generated by hypermail 2.1.5 : Mon Feb 24 2003 - 08:42:11 EST