confusables.txt, the official standard, and font requirements

From: Andrew S (
Date: Wed Oct 26 2005 - 14:51:40 CST

  • Next message: Michael Everson: "Re: Improper grounds for rejection of proposal N2677"

    At is:
    "Not all sans-serif fonts allow an easy distinction between lowercase l, and uppercase I and not all monospaced (monowidth) fonts allow a distinction between the letter l and the digit one. Such fonts are not usable for mathematics. In Fraktur, the letters I and J in particular must be made distinguishable. Overburdened Black Letter forms are inappropriate. Similarly, the digit zero must be distinct from the uppercase letter O for all mathematical alphanumeric sets."

    What interests me here is the word "must". This is apparently a requirement placed on fonts in order for them to be considered "Unicode fonts". But is this part of the official standard?

    1. Does the Unicode standard mandate that a compliant font which defines glyphs for both the character U+0031 (arabic numeral "1") and U+006C (lowercase Latin "l") define those glyphs to be not homographic, with "homographic" defined as "a human knowledgeable of arabic numerals and latin letters couldn't reasonably be expected to reliably distinguish the glyphs without the aid of context"?
    2. Does Unicode allow a compliant font to define homographic glyphs for U+0041 (first letter of Latin alphabet) and U+0410 (first letter of Cyrillic alphabet)? Does it allow the character U+0022 (double quote) to be homographic to the string U+0027 U+0027 (two single right quotes)?
    3. If so, then where can I find the definition of the equivalence relation of permitted homographics, defined on the set of all Unicode character strings (including of course one-character strings)? Is confusables.txt supposed to be the definition of this relation?
    4. Is the equivalence relation of mandatory homographics empty?
    5. Besides inter-glyph homographic constraints, does Unicode define intra-glyph constraints for any characters? For example, does Unicode contain any rules which would prevent a compliant font from defining the glyph for the character U+0041 to look like a Times New Roman "B", or the glyph for the character U+0042 to look like a smiley face?

    I was under the impression that Unicode officially disclaims specification of glyphs and mappings between characters and glyphs, but the quote above seems to show that I was mistaken.

    This archive was generated by hypermail 2.1.5 : Wed Oct 26 2005 - 14:52:53 CST