Re: Transcriptions of "Unicode"

From: addison@inter-locale.com
Date: Fri Dec 08 2000 - 02:47:47 EST


Hi Erik,

I didn't mean to imply that what NN does is wrong. This is the behavior
that I expect, and, in fact, I'm pretty impressed with the fact that it
*does* work. The LANG page attribute thing was news to me and I'm glad to
see it: it means that page generators and designers can control *how* a
page is displayed (so Japanese pages look "Japanese" while pages that use
characters entirely in the JIS range but which are in, say, Chinese look
Chinese).

My comment on "user's locale" was more intended as "page locale (at
generation time on the server)"---how does the server generate the content
on a page and format data. If I'm looking at a Japanese page I expect to
see it in Japanese, not my current system locale (which is usually
something else).

Best Regards,

Addison

===========================================================
Addison P. Phillips Principal Consultant
Inter-Locale LLC http://www.inter-locale.com
Los Gatos, CA, USA mailto:addison@inter-locale.com

+1 408.210.3569 (mobile) +1 408.904.4762 (fax)
===========================================================
Globalization Engineering & Consulting Services

On Wed, 6 Dec 2000, Erik van der Poel wrote:

> Yes, N6 will choose some fonts from the system for whatever characters
> are in the document, whether transmitted in UTF-8 or not.
>
> My point was that it does not use the Unicode character code to select
> one of the "language groups" from the list that we have in the font
> preferences dialog. The language group is selected by a LANG attribute,
> if any, otherwise by the charset, if non-Unicode-based, otherwise the
> user's locale's language.
>
> Yes, I call it the "user's locale" because that's what it is on Unix
> (basically). The user can set an environment variable called LANG for
> each Unix process (but typically for all of them). On Windows, we just
> use the system locale.
>
> The font selection is indeed somewhat haphazard for CJK when there are
> no LANG attributes and the charset doesn't tell us anything either, but
> then, what do you expect in that situation anyway? I suppose we could
> deduce that the language is Japanese for Hiragana and Katakana, but what
> should we do about ideographs? Don't tell me the browser has to start
> guessing the language for those characters. I've had enough of the
> guessing game. We have been doing it for charsets for years, and it has
> led to trouble that we can't back out of now. I think we need to draw
> the line here, and tell Web page authors to mark their pages with LANG
> attributes or with particular fonts, preferrably in style sheets.
>
> (No, we do not prefer Japanese fonts for characters that are in JIS X
> 0208. I could tell you the details of the current code if you want, but
> I doubt that others care about it, especially since we're talking about
> a fringe case where neither the document nor charset nor locale tell us
> which font to use.)
>
> Erik
>
> addison@inter-locale.com wrote:
> >
> > But NN6 *does* select a font for characters outside the so-called user's
> > locale when said characters are in a UTF-8 page. It appears that this
> > mechanism is somewhat haphazard for CJK unified ideographs: I get a mix of
> > fonts usually (probably because ja is in my locale "stack" currently and
> > 'zh' and 'ko' are not, so I guess Japanese fonts are preferred for
> > characters that are in JIS X 208 ??).
> >
> > AP
> >
> > ===========================================================
> > Addison P. Phillips Principal Consultant
> > Inter-Locale LLC http://www.inter-locale.com
> > Los Gatos, CA, USA mailto:addison@inter-locale.com
> >
> > +1 408.210.3569 (mobile) +1 408.904.4762 (fax)
> > ===========================================================
> > Globalization Engineering & Consulting Services
> >
> > On Mon, 4 Dec 2000, Erik van der Poel wrote:
> >
> > > Mark Davis wrote:
> > > >
> > > > What wasn't clear from his message
> > > > is whether Mozilla picks a reasonable font if the language is not there.
> > >
> > > Sorry about the lack of clarity. When there is no LANG attribute in the
> > > element (or in a parent element), Mozilla uses the document's charset as
> > > a fallback. Mozilla has font preferences for each language group. The
> > > language groups have been set up to have a one-to-one correspondence
> > > with charsets (roughly). E.g. iso-8859-1 -> Western, shift_jis -> ja.
> > > When the charset is a Unicode-based one (e.g. UTF-8), then Mozilla uses
> > > the language group that contains the user's locale's language.
> > >
> > > In other words, Mozilla does not (yet) use the Unicode character codes
> > > to select fonts. We may do this in the future.
> > >
> > > Erik
> > >
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT