RE: Generic Base Letter

From: Vincent Setterholm (vincent@logos.com)
Date: Sun Jun 27 2010 - 06:05:49 CDT

  • Next message: Doug Ewell: "Re: Generic Base Letter"

    I tried sending this once with a small attachment showing what I'm seeing, but it doesn't look like it got forwarded, so I'll include some HTML at the end of this email you can paste into a .html file so you can see the behavior I'm talking about. This same behavior occurs in IE8, Word 2007 and WPF applications (even built with .NET 4).

    You can also see what I'm talking about in plain text below as well if you're using Outlook or an IE-based web mail (the order of marks will flip if you set the paragraph direction to RTL, but basically it's the same problem - extra circles, nothing combining properly):

    ◌ָּ

    If you can make those three code points (25CC 05BC 05B8) combine in IE8, you're my hero (though I really need this working in WPF as well, since that is the display technology we're using).
     
    So if Microsoft allows some combining marks to combine with 25CC, they certainly aren't permitting Hebrew vowels to do so (I did do an experiment with 0308 and the display looked crummy, but at least there Microsoft wasn't inserting an extra dotted circle so font design work migh be abl to resolve that, but this is not the case for characters on the Hebrew code page). As I stated previously, I can't just pick a regular Hebrew letter, as I need to show combinations that include prefixes, suffixes an infixes along with the vowel pattern, so to introduce extra consonants would defeat the purpose.

    HTML snip:

    <html>
    <body>
    <h1>Internet Explorer 8 display snafu demo</h1>
    <P lang="he" dir="rtl"><font face="SBL Hebrew" size="11">
    &#x25CC;&#x05BC;&#x05B8;
    </font></p>
    </body>
    </html>

    ________________________________________
    From: unicode-bounce@unicode.org [unicode-bounce@unicode.org] On Behalf Of Philippe Verdy [verdy_p@wanadoo.fr]
    Sent: Sunday, June 27, 2010 1:54 AM
    To: Vincent Setterholm; Otto Stolz
    Cc: 'unicode@unicode.org'
    Subject: RE: Generic Base Letter

    I don't know what Microsoft does, but at least, combining 25CC with a
    combining diacritic DOES work in current versions of Internet
    Explorer.

    But as it is known that this could cause a problem, for example when
    rendering charts on the web, a simple solution generally adopted
    involves the use of a more natural arbitrary base character, and some
    other presentation style (such as colored backgrounds).

    See examples like there (diacritics are shown with a natural base
    character, but a consistant blue background for all tables):

    - http://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode/U0300
    (it uses the Latin letter 'o' for diacritics used with the Latin script)

    - http://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode/U0590
    (it uses the Hebrew letter SHIN for all Hebrew diacritics)

    - http://fr.wikipedia.org/wiki/Table_des_caractères_Unicode/U0600
    (another Arabic letter is used for all Arabic diacritics)

    And so on...

    Additionally, the controls are shown with a red background, and format
    controls are within a box with a dashed border. Unallocated codepoints
    are shown with a grey background. There's no risk of confusion with a
    true dotted circle symbol.

    But the Unicode and ISO/IEC 10646 charts (in PDFs or printed books)
    need to be monochrome, so instead of using distinctive color
    background, it's normal that they use a symbol that cannot be exactly
    similar to an encoded character.

    Philippe.

    "Vincent Setterholm" <vincent@logos.com> wrote:
    >
    > I've tried using 25CC. The problem I'm running into is that the font designer can make marks combine with 25CC just fine but then Microsoft simply ignores the look-up tables that shape these combinations and inserts their own dotted circle (or circles - one per combining mark) anyway.
    >
    > I don't know what effect using a 'symbol' for a letter has on indexing or searching or line/word breaking because I haven't even gotten so far as to get the display to look right, but I'm guessing there'd also be an advantage to such a character having letter semantics.
    >
    > This need to display marks, well-formed on a generic base, is a really common phenomenon. Countless grammars and other philology and linguistics books/articles/etc. have to represent these types of patterns. I think there needs to be an official solution for placing marks on a generic base that behaves like a letter - something documented so that future font designers can support this and so that the technology providers like Microsoft, ICU, etc. have clear directions on how to support this.
    >
    > If using 25CC really is the answer, then let's publish that solution as part of the Unicode Standard so that all font designers can follow this convention and so that we can have some hope of companies like Microsoft supporting the standard.
    >
    > ________________________________________
    > From: Otto Stolz [Otto.Stolz@uni-konstanz.de]
    > Sent: Saturday, June 26, 2010 8:03 AM
    > To: Vincent Setterholm
    > Cc: 'unicode@unicode.org'
    > Subject: Re: Generic Base Letter
    >
    > Hi Vincent Setterholm,
    >
    > you have been asking:
    > > What I'd like to see is a code point for a generic base character
    >
    > You could try U+25CC DOTTED CIRCLE, though the reference glyph
    > for this cgaracter is larger than the dotted circles used to
    > attach the various combining marks, in their respective reference
    > glyphs.
    >
    > Best wishes,
    > Otto Stolz
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sun Jun 27 2010 - 06:08:16 CDT