Re: displaying Unicode text (was Re: Transcriptions of "Unicode")

From: Mark Davis (mark@macchiato.com)
Date: Thu Dec 07 2000 - 12:20:33 EST


Thanks! I appreciate the description. My fears were unfounded.

> This states that *for each character* in the element, the implementation
> is supposed to go down the list of fonts in the font-family property, to
> find a font that exists and that contains a glyph for the current
> character.

I agree that this does not produce the optimal results, since one should
have the freedom to select different fonts based on the context of the
character. The above description is much better than a very coarse-grained
approach (like having the entire document or element in the same font), but
needs some more wriggle-room to allow people flexibility.

Mark

----- Original Message -----
From: "Erik van der Poel" <erik@netscape.com>
To: "Unicode List" <unicode@unicode.org>
Cc: "Unicode List" <unicode@unicode.org>
Sent: Thursday, December 07, 2000 00:30
Subject: Re: displaying Unicode text (was Re: Transcriptions of "Unicode")

> Mark Davis wrote:
> >
> > Let's take an example.
> >
> > - The page is UTF-8.
> > - It contains a mixture of German, dingbats and Hindi text.
> > - My locale is de_DE.
> >
> > From your description, it sounds like Modzilla works as follows:
> >
> > - The locale maps (I'm guessing) to 8859-1
> > - 8859 maps to, say Helvetica.
> > - The dingbats and Hindi appear as boxes or question marks.
> >
> > This would be pretty lame, so I hope I misunderstand you!!
>
> Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes,
> you've misunderstood me, but only because I abbreviated so much. Sorry.
> Let me try again, with more feeling this time.
>
> Using the example above:
>
> - The locale maps to "x-western" (ja_JP would map to "ja", so I've
> prepended "x-" for the "language groups" that don't exist in RFC 1766)
>
> - x-western and CSS' sans-serif map to Arial
>
> - The dingbats appear as dingbats if they are in Unicode and at least
> one of the dingbat fonts on the system has a Unicode cmap subtable
> (WingDings is a "symbol" font, so it doesn't have such a table), while
> the Hindi might display OK on some Windows systems if they have Hindi
> support (Mozilla itself does not support any Indic languages yet).
>
> We could support the WingDings font if we add an entry for WingDings to
> the following table:
>
>
http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp
#872
>
> We just haven't done that yet.
>
> Basically, Mozilla will look at all the fonts on the system to find one
> that contains a glyph for the current character.
>
> The language group and user locale stuff that I mentioned earlier is
> only one part of the process -- the part that deals with the user's font
> preferences. I'll explain more of the rest of the process:
>
> Mozilla implements CSS2's font matching algorithm:
>
> http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm
>
> This states that *for each character* in the element, the implementation
> is supposed to go down the list of fonts in the font-family property, to
> find a font that exists and that contains a glyph for the current
> character. Mozilla implements this algorithm to the letter, which means
> that fonts are chosen for each character without regard for neighboring
> characters (unlike MSIE). This may actually have been a bad decision,
> since we sometimes end up with text that looks odd due to font changes.
>
> Anyway, Mozilla's algorithm has the following steps:
>
> 1. "User-Defined" font
> 2. CSS font-family property
> 3. CSS generic font (e.g. serif)
> 4. list of all fonts on system
> 5. transliteration
> 6. question mark
>
> You can see these steps in the following pieces of code:
>
>
http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp
#2642
>
>
http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#310
8
>
> 1. "User-Defined" font (FindUserDefinedFont)
>
> We decided to include the User-Defined font functionality in Netscape 6
> again. It is similar to the old Netscape 4.X. Basically, if the user
> selects this encoding from the View menu, then the browser passes the
> bytes through to the font, untouched. This is for charsets that we don't
> already support. This step needs to be the first step, since it
> overrides everything else.
>
> 2. CSS font-family property (FindLocalFont)
>
> If the user hasn't selected User-Defined, we invoke this routine. It
> simply goes down the font-family list to find a font that exists and
> that contains a glyph for the current character. E.g.:
>
> font-family: Arial, "MS Gothic", sans-serif;
>
> 3. CSS generic font (FindGenericFont)
>
> If the above fails, this routine tries to find a font for the CSS
> generic (e.g. sans-serif) that was found in the font-family property, if
> any, otherwise it falls back to the user's default (serif or
> sans-serif). This is where the font preferences come in, so this is
> where we try to determine the language group of the element. I.e. we
> take the LANG attribute of this element or a parent element if any,
> otherwise the language group of the document's charset, if
> non-Unicode-based, otherwise the user's locale's language group.
>
> 4. list of all fonts on system (FindGlobalFont)
>
> If the above fails, this routine goes through all fonts on the system,
> trying to find one that contains a glyph for the current character.
>
> 5. transliteration (FindSubstituteFont)
>
> If we still can't find a font for this character, we try a
> transliteration table. For example, the euro is mapped to the 3 ASCIIs
> "EUR", which is useful on some Unix systems that don't have the euro
> glyph yet. Actually, this transliteration step isn't even implemented on
> Windows yet.
>
> 6. question mark (FindSubstituteFont)
>
> If we can't find a transliteration, we fall back to the last resort --
> the good ol' question mark.
>
> That's it. I hope I didn't abbreviate too much this time!
>
> Erik



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT