RE: Can browsers show text? I don't think so

From: Michael Jansson (mjan@em2-solutions.com)
Date: Thu Jul 04 2002 - 08:28:35 EDT


> Marco Cimarosti wrote:
> > Take Polynesian languages for example, e.g. Hawaiian. [...]
>
> Curiously, Hawaiian could be supported by the "Baltic Latin" subset
> (http://czyborra.com/charsets/iso8859.html#ISO-8859-4 --
> although I guess
> they'd have some problems interpreting menus in Estonian or
> Latvian. :-)

FYI - The problem they have is with the "okina" character, which looks like
an upside down apostrophe, but does not behave like one (doesn't break a
word). You won't find it in iso-8859-4. Also, although many fonts (e.g. the
Windows core fonts) support all the Hawaiian vowel characters, they do not
include the okina. This is unfortunately a quite common and important
character, as it may give a word different meanings if it is left out.
You'll find some interesting reading about Hawaiian at:
http://www.ahapunanaleo.org/OL.htm
which incidentally is a web site powered by web fonts. (The site is being
developed, so it may be down or contain rough pages).

>
> > A lot of people that rely on Unicode-only scripts do use (often
> > incompatible) propriatory encoding schemes. They are often
> > forced to use machines installed with anothe language (e.g.
> > English) when using these schemes. Web browsers should be
> > able to accomodate these people without having to rely on
> > the OS for Unicode support.
>
> Why!? These proprietary "encoding" schemes often have serious
> functional and
> compatibility problems, so the way to go is trying to wipe them out of
> history as soon as possible.

I expressed myself poorly before. I could not agree more with you that
propriatery encodings should be phased out. My point is that web browsers
should be able to support Unicode on these platforms so that they indeed
would be able to phase out such hack encodings.

>
> In principle, Unicode support is not rocket science, and does not
> necessarily have to be at the OS level. Browsers for older OS's can
> implement Unicode themselves, using libraries such as ICU.

My point exactly. You don't have to be an OS or library engineers to write
Unicode capable software. I do believe that (at least some) browser engineer
do include Unicode support beyond what is provided by the OS. It's not
nearly enough though. There is lacking support for fonts in particular.

> > The problem today is that browsers typically do not support
> > web fonts, let alone in a smart way. It's the
> > implementations and not the technology that is to blame.
>
> Agreed. I think web font technology may take two alternative roads:
>
> 1) Trying to match (part of) the "complex script"
> functionality offered by
> modern font technologies (OpenType or its competitors);
>
> 2) Defining a standard glyph code and a standard glyphization
> process for
> Unicode. The client should then be required to run this
> glyphization process
> behind the scenes, and then to load glyphs from fonts using
> the standard
> glyph codes.
>
> Approach 1 is much more flexible, but approach 2 is easier to
> implement and
> requires less data in the fonts.

A variant of approach 2) is to support pre-composed Unicode text, e.g.
visually and not canonical ordered text. For example, an arabic Unicode text
can easily be reorder, mapped to glyphs and then mapped back to code points.
This will only work as long as there is a code point for all the used
glyphs, e.g. a real character for an arabic ligature. This is fortunately
the case for most (although perhaps not all) arabic characters. Such an
approach works will for some languages, but not all, e.g. it works well for
Tamil but not for Hindi (there may be a large number of i-vowels, each one
with a different width, but there is only one code point for this
character). Also, a web server would have to be able to serve up different
formats depending on how text savvy the browser is.

Approach three can be achieved as a variant of 2 in turn, by mapping back
glyphs to code points when possible and then create HTML literals that
include the name of alternate forms of the characters. (Providing names for
alternate forms of characters is incidently a popular topic on the OpenType
mailing list). For example, the wide alternate form of a hindi I (U+93F)
could be encoded as "&#2367.wide;" (or &#2367.x-wide; etc).

>
> In both cases, web documents should be encoded only with
> Unicode or other
> standard encodings, not with codes referring to glyphs in a
> particular font.

Yes, exactly.

>
> _ Marco
>



This archive was generated by hypermail 2.1.2 : Thu Jul 04 2002 - 06:36:40 EDT