Re: Transcriptions of "Unicode"

From: Erik van der Poel (erik@netscape.com)
Date: Wed Dec 06 2000 - 17:29:22 EST


Yes, N6 will choose some fonts from the system for whatever characters
are in the document, whether transmitted in UTF-8 or not.

My point was that it does not use the Unicode character code to select
one of the "language groups" from the list that we have in the font
preferences dialog. The language group is selected by a LANG attribute,
if any, otherwise by the charset, if non-Unicode-based, otherwise the
user's locale's language.

Yes, I call it the "user's locale" because that's what it is on Unix
(basically). The user can set an environment variable called LANG for
each Unix process (but typically for all of them). On Windows, we just
use the system locale.

The font selection is indeed somewhat haphazard for CJK when there are
no LANG attributes and the charset doesn't tell us anything either, but
then, what do you expect in that situation anyway? I suppose we could
deduce that the language is Japanese for Hiragana and Katakana, but what
should we do about ideographs? Don't tell me the browser has to start
guessing the language for those characters. I've had enough of the
guessing game. We have been doing it for charsets for years, and it has
led to trouble that we can't back out of now. I think we need to draw
the line here, and tell Web page authors to mark their pages with LANG
attributes or with particular fonts, preferrably in style sheets.

(No, we do not prefer Japanese fonts for characters that are in JIS X
0208. I could tell you the details of the current code if you want, but
I doubt that others care about it, especially since we're talking about
a fringe case where neither the document nor charset nor locale tell us
which font to use.)

Erik

addison@inter-locale.com wrote:
>
> But NN6 *does* select a font for characters outside the so-called user's
> locale when said characters are in a UTF-8 page. It appears that this
> mechanism is somewhat haphazard for CJK unified ideographs: I get a mix of
> fonts usually (probably because ja is in my locale "stack" currently and
> 'zh' and 'ko' are not, so I guess Japanese fonts are preferred for
> characters that are in JIS X 208 ??).
>
> AP
>
> ===========================================================
> Addison P. Phillips Principal Consultant
> Inter-Locale LLC http://www.inter-locale.com
> Los Gatos, CA, USA mailto:addison@inter-locale.com
>
> +1 408.210.3569 (mobile) +1 408.904.4762 (fax)
> ===========================================================
> Globalization Engineering & Consulting Services
>
> On Mon, 4 Dec 2000, Erik van der Poel wrote:
>
> > Mark Davis wrote:
> > >
> > > What wasn't clear from his message
> > > is whether Mozilla picks a reasonable font if the language is not there.
> >
> > Sorry about the lack of clarity. When there is no LANG attribute in the
> > element (or in a parent element), Mozilla uses the document's charset as
> > a fallback. Mozilla has font preferences for each language group. The
> > language groups have been set up to have a one-to-one correspondence
> > with charsets (roughly). E.g. iso-8859-1 -> Western, shift_jis -> ja.
> > When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses
> > the language group that contains the user's locale's language.
> >
> > In other words, Mozilla does not (yet) use the Unicode character codes
> > to select fonts. We may do this in the future.
> >
> > Erik
> >



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT