Re: glyph selection for Unicode in browsers

From: Tex Texin (
Date: Fri Sep 27 2002 - 01:17:02 EDT

  • Next message: William Overington: "Re: Keys. (derives from Re: Sequences of combining characters.)"


    My preference is that tagged information should display as tagged and
    the user can do something specifically to override it if they want.
    But then, I can't read CJK and so would be glad to get comments from
    those communities. I can see arguments both for and against user
    preference to take precedence over tags.

    Where there is no language information in the document, it makes sense
    to have user preference or heuristics attempt to supply the information.
    Where the tag is clearly inappropriate, for example, text labeled as
    English that is clearly Chinese, sure override the tag.
    Where the tag is wrong but difficult to detect (Traditional vs.
    Simplified) too bad- the author gets what he deserves.

    Also, heuristics work well with longer runs of text, but not for shorter
    runs. (names and addresses, quotations, etc.)

    From an implementation standpoint, once you have the ability for
    language to influence font selection, the significant part is done.
    Determining which language to use, from a tag, or user preference, or
    heuristic, is the easy part.
    I wouldn't have a problem with some precedence rules over which to use,
    or even some negotiation where the text clearly belongs to a script, and
    the language influence of tag, user preference or heuristic is limited
    to whether their recommendation is appropriate for the script.
    (Hopefully the heuristic is always in line with the script.)

    I do need to point out that user preference is problematic if it means
    that for a user to display a multilingual document, the user has to go
    thru and specify font preferences for languages they know nothing about.
    Just because I don't read CJK, doesn't mean I don't have legitimate
    needs to display or print CJK in a typographically correct way.
    Librarians, Commerce exchanges, mailing lists, localizers, etc.

    But although you didn't quite say this, a user could provide a
    preference not for font, but language, i.e. if the script is CJK,
    display it as C or J or K (or T). And given the language the font
    mechanisms would do a reasonable thing.


    Mark Davis wrote:
    > > not to replace one broken model (code page = language) with
    > > another broken model (language = font preference).
    > I would add to that that I suspect that given the number of documents
    > that fail to tag with language, or even worse yet, tag with the wrong
    > language, that other approaches may give generally better results. The
    > main area of concern is CJK, and I suspect that in a great many cases
    > the user is probably better off either:
    > - simply using a font set according to the user's own preference, or
    > - having a bit of smarts in the program for heuristically picking
    > among C, J and K.
    > Mark
    > __________
    > ◄ “Eppur si muove” ►
    > ----- Original Message -----
    > From: "Kenneth Whistler" <>
    > To: <>
    > Cc: <>; <>
    > Sent: Thursday, September 26, 2002 16:17
    > Subject: Re: glyph selection for Unicode in browsers
    > > Tex,
    > >
    > > > 3) The language information used to be derived
    > >
    > > dubiously
    > >
    > > > from code page and is
    > > > missing with Unicode, and architecture needs to accomodate a
    > better
    > > > model for bringing language to font selection.
    > >
    > > The archetypal situation is for CJK, and in particular J,
    > > where language choice correlates closely with typographical
    > > preferences, and where character encoding could, in turn,
    > > be correlated reliably with language choice.
    > >
    > > But in general, the connection does not hold, as for data
    > > in any of hundreds of different languages written in Code Page 1252,
    > > for example.
    > >
    > > What you are really looking for, I believe, is a way to
    > > specify typographical preference, which then can be used to
    > > drive auto-selection of fonts.
    > >
    > > I don't think we should head down the garden path of trying
    > > to tie typographical preference too closely to language identity,
    > > however we unknot that particular problem. This could get
    > > you into contrarian problems, where browsers (or other tools)
    > > start paying *too* much attention to language tags, and
    > > automatically (and mysteriously) override user preferences
    > > about the typographical preferences they expect for characters.
    > >
    > > What is needed, I believe, is:
    > >
    > > a. a way to establish typographic preferences
    > > b. a way to link typographical preference choices to
    > > fonts that would express them correctly
    > > c. a way to (optionally) associate a language with
    > > a typographical preference
    > >
    > > And this all should be done, of course, in such a way that
    > > default behavior is reasonable and undue burdens of understanding,
    > > font acquisition, installation, and such
    > > are not placed on end-users who simply want to read and print
    > > documents from the web.
    > >
    > > A tall order, I am sure. But as long as we are blue-skying about
    > > architecture for better solutions, I think it is important
    > > not to replace one broken model (code page = language) with
    > > another broken model (language = font preference).
    > >
    > > --Ken
    > >

    Tex Texin   cell: +1 781 789 1898
    Xen Master                
    Making e-Business Work Around the World

    This archive was generated by hypermail 2.1.5 : Fri Sep 27 2002 - 02:13:08 EDT