Michael Jansson wrote:
> A variant of approach 2) is to support pre-composed Unicode text, e.g.
> visually and not canonical ordered text. For example, an
> arabic Unicode text
> can easily be reorder, mapped to glyphs and then mapped back
> to code points.
> This will only work as long as there is a code point for all the used
> glyphs, e.g. a real character for an arabic ligature. This
> is fortunately
> the case for most (although perhaps not all) arabic
> characters. Such an
> approach works will for some languages, but not all, e.g. it
> works well for
> Tamil but not for Hindi (there may be a large number of
> i-vowels, each one
> with a different width, but there is only one code point for this
> character). Also, a web server would have to be able to serve
> up different
> formats depending on how text savvy the browser is.
If I understand what you mean, I totally disagree.
Using presentation forms in the document (and, hence, adding more
presentation forms to cover all scripts!) is by no means better or smarter
than using traditional "glyph encodings". It is just Unicode-based glyph
encoding, and it presents exactly the same troubles as ASCII-based glyph
OTOH, doing exactly the same thing (i.e., mapping text to presentation
forms) under the application's hoods, on the *client's* side, is a simple
but viable way of displaying "complex scripts" when complex fonts are not
Of course, you do need a "code point" for any glyph, but that would be just
an internal "glyph ID": a private convention bound to a certain font format,
which would not affect the way the document is encoded and transmitted.
(BTW, by "private convention" I am not referring to PUA).
> Approach three can be achieved as a variant of 2 in turn, by
> mapping back glyphs to code points when possible [...]
Have you tried doing this with Indic scripts? I did and I must admit that...
I am struck!
Mapping a string of code points to a string of glyph ID's was relatively
easy; mapping the other way round proved quite tricky.
I still think this backward mapping is possible (or else we'll never see an
Indic OCR), but so far haven't succeeded doing it myself.
This archive was generated by hypermail 2.1.2 : Thu Jul 04 2002 - 08:01:52 EDT