RE: Can browsers show text? I don't think so

From: Michael Jansson (
Date: Thu Jul 04 2002 - 10:53:36 EDT

> Marco Cimarosti wrote:
> > A variant of approach 2) is to support pre-composed Unicode
> > text, e.g.
> If I understand what you mean, I totally disagree.
> Using presentation forms in the document (and, hence, adding more
> presentation forms to cover all scripts!) is by no means
> better or smarter
> than using traditional "glyph encodings". It is just
> Unicode-based glyph
> encoding, and it presents exactly the same troubles as
> ASCII-based glyph
> encodings.

I am not proposing using PUA or introducing new code points to do this. You
would still have valid Unicode characters in the page (of sorts). The
characters would be ordered visually though, and contain extra information
to let the user agent (the browser, a.k.a. 'ua') know which alternate form
to use for a character. You can encode this information in other ways too.
For example, our FAIRY system supports the notion of enabling font specific
features (OpenType Layout features). You are thus also be able to tag
individual character with enough information for the ua to render the
character in the right form.

The benefit is that the ua does not need to "understand" the language. The
downside is that you would have visually ordered Unicode text. This is less
than ideal of course, and it requires a smart web server or a smart page
creation tool to create the pages. The pages would probably be rather large
as well.

The better solution, of course, would be if the browsers truly supported

> OTOH, doing exactly the same thing (i.e., mapping text to presentation
> forms) under the application's hoods, on the *client's* side,
> is a simple
> but viable way of displaying "complex scripts" when complex
> fonts are not
> available.
> Of course, you do need a "code point" for any glyph, but that
> would be just
> an internal "glyph ID": a private convention bound to a
> certain font format,
> which would not affect the way the document is encoded and
> transmitted.

I don't see why you need code points in this case. A ua can easily render
text with nothing but glyph index as input, i.e. let the OS know that it
needs to access a specific glyph or even render that glyph itself. The
problem, of course, is what to do if you don't have the font and that was
used to produce the indices. You may not know how to map index between two
fonts. A naming convension that I proposed in my previous message would
solve that problem.

> (BTW, by "private convention" I am not referring to PUA).
> > Approach three can be achieved as a variant of 2 in turn, by
> > mapping back glyphs to code points when possible [...]
> Have you tried doing this with Indic scripts? I did and I
> must admit that...
> I am struck!

Yes, our web server software FAIRY does this, optionally without using any
PUA characters. Unfortunatelly, some platforms do not support visually
ordered Unicode for some languages. For example, Windows Xp will show
visually ordered Hebrew and Arabic, but will show bad characters for
visually ordered Hindi and Tamil. You may thus have to use PUA characters
(or the equivallent) to prevent the OS from doing additional glyph

> Mapping a string of code points to a string of glyph ID's was
> relatively
> easy; mapping the other way round proved quite tricky.
> I still think this backward mapping is possible (or else
> we'll never see an
> Indic OCR), but so far haven't succeeded doing it myself.

It's not impossible, so keep going...

> _ Marco

em2 Solutions
Michael Jansson

This archive was generated by hypermail 2.1.2 : Thu Jul 04 2002 - 09:19:36 EDT