Re: Combining diacriticals and Cyrillic

From: Philippe Verdy (
Date: Thu Jul 10 2003 - 09:04:56 EDT

  • Next message: "Re: 24th Unicode Conference - Atlanta, GA - September 3-5, 2003"

    On Thursday, July 10, 2003 10:24 AM, <> wrote:

    > Dear Ladys and Gentlemen,
    > Currently there is an ongoing effort in Bulgaria trying to resolve an
    > issuie concerning the way we write in Bulgarian.
    > Our problem is:
    > Usually a bulgarian regular user does not need to write accented
    > characters. There is one middle-sized exclusion of this, but
    > generally we do fine without accented characters. The problem is that
    > in some special cases or more serious lingustic work, one definetely
    > needs to be able to write accented characters (accented vowels).
    > One of the ideas is to invent a new ASCII-based encodings, containing
    > the accented characters we need. This would introduce an additional
    > disorder in the current mess of cyrillic encodings, and would
    > introduce problems with automated spellcheck.
    > Generally I beleive it would be best to invent a Unicode based
    > solution.
    > Such a solution is for example, combining diacritical signs with the
    > cyrillic symbols.
    > I composed a demo page:
    > and then made 10-20 shots of the results on Opera and IE on Linux,
    > Windows 98 and Windows XP:
    > You can see that this approach yields _quite_ incosistent and useless
    > results, depending on the font, application and operating system
    > being used.

    On Windows XP, there's no "incorrect" rendering. However the best rendering comes with Arial MS Unicode, which is part of Office, bit not part of Windows XP or Internet Explorer fonts.
    The other named fonts are much less common and require an explicit installation by the user.

    The effective font then becomes "sans-serif", normally bound in the user settings to Arial (by default on Windows): the result is correct, with the right grave accents used, but the rendering is poor, as they are not handled in Arial by ligating the combining sequence in a specially prepared and ligated glyph, but simply as a separate non spacing accent, displayed a bit too high above the ascent line, and not centered on the previous character.

    The reason for it is that Arial, /not Arial MS Unicode/, does not contain placement hints for each combining class of diacritics in the definition of base characters, but diacritics are only using an approximate relative positioning in a non-spacing glyph, with a single relative offset adjusted to work on most Latin letters (the Arial TrueType font does not include any advanced OpenType tables for positioning of pairs of glyphs).

    However, this text rendered with Arial is still readable and correct according to Unicode, just poorly rendered.

    Note that the effective version of these fonts is important: the Arial font provided with Windows 95 is TrueType only (there's no OpenType font support in W95, and the UniScribe engine is only provided as a supplement for Internet Explorer 5+, and is not used by Netscape 4 and probably other browsers as well)...

    On Windows XP, the usage of UniScribe and its support of OpenType fonts is transparent to most applications (integrated within most GDI primitives, and USER32 GUI components). So the difference is not much between browsers, but between OS versions (and localization for older OSes).

    This archive was generated by hypermail 2.1.5 : Thu Jul 10 2003 - 09:56:49 EDT