Re: Frequent incorrect guesses by the charset autodetection in IE7

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon Jul 17 2006 - 19:23:43 CDT

Next message: Erkki Kolehmainen: "Re: Frequent incorrect guesses by the charset autodetection in IE7"

Previous message: Samuel Thibault: "Re: Frequent incorrect guesses by the charset autodetection in IE7"
In reply to: Philippe Verdy: "Re: Frequent incorrect guesses by the charset autodetection in IE7"
Next in thread: Sinnathurai Srivas: "Re: Frequent incorrect guesses by the charset autodetection in IE7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Adam Twardoch wrote on Monday, July 17, 2006 at 10:38 PM
Subject: Re: Frequent incorrect guesses by the charset autodetection in IE7

> Sinnathurai Srivas wrote:
>> Unicode email is not working properly
>> ISO Email is working properly.
>> Hacked 8bit coding works among vendor opposition.
> I’ve been using UTF-8 in my e-mail for several years now. I don’t know
> what you mean.

I'll hazard a guess. One issue is that many applications choose the font on
the basis of the encoding. It is by no means unusual to see 'Unicode' in a
pick list of languages! I notice these differences in Outlook Express
because I use a mixture of Unicode, Latin-1 and Thai encodings. There is a
further problem that fonts are scaled to a common pitch rather than a common
x-height. I generally get round it by selecting Tahoma for Thai because it
has a rather high x-height to pitch ratio for Thai - at the expense of
making some vowel-tone combinations hard to distinguish. Of course, you
might have exactly the same problem if you were to use a 7 or 8-bit ISO-2022
scheme, though I suppose it might switch font according to character set.
Word 2002 seems to do something similar, presumably working off Unicode
blocks, and has styles that specify different fonts and pitches for
different scripts.

Philippe Verdy wrote on Monday, July 17, 2006 10:10 PM

> From: "Sinnathurai Srivas" <sisrivas@blueyonder.co.uk>
>> Collation (after 15 years) is not yet working in Unicode.
> Huh???? Completely wrong.

It's still taking its time to work through. Thai collation in Excel 2000
and 2002 has to be seen to be believed. I can only believe that someone
misinterpreted the specification - perhaps he knew just a little about
Devanagari, and misapplied it to Thai. Or perhaps the error lay with the
composition of the specification, so it's a design fault rather than a bug.
On the other hand, Thai collation in Word 2002 seems to work.

There are tools that will do collation, but being able to sorting a table in
Word that contains a mixture of Tamil and English script to the satisfaction
of a Tamil non-programmer resident in England may yet be another matter.
(I've seen a commercial, or at least non-free, add-on to do Lao collation in
Word.) As to sorting such a table to the satisfaction of a Malayalee...

>> ISO collation works very well.
> If you are speaking about the binary ordering, this is not collation, and
> there exists *NO* 7-8bit encoding whose binary encoding supports the
> conventions used in different languages.

>> Unicode word processing works with few vendor applications with immense
>> difficulties.
> There's no difficulty today.

Half-full v. half-empty. While most scripts may work well, there are and
have been problems areas like Malayalam, Burmese and Bengali. However, I
find it hard to believe that ISCII works any better for Malayalam and
Bengali. On the other hand, adapting typewriter solutions ought to work -
*provided* the typewriter solution works! Part of the problem a Windows
user faces is that he can't override Uniscribe. If he thinks he knows
better than Uniscribe, then he has to eschew Microsoft products.

An imprtant deficiency in the Uniscribe implementation of the Tamil script
is that one cannot use the superscript or subscript digits on all
combinations of consonant and vowel. The Unicode standard mentions these
combinations, but does not say how to encode them. I don't believe new
characters are needed, so I don't know how one can tackle this omission.

A possible example of the issues is the Unicode standard. Much of the
non-Latin text does not appear to have been composed in Unicode! For a new
complex script that is inevitable, and of course the text in a new script
for a script proposal cannot be encoded in Unicode. (It could in theory be
encoded in the PUA, but I'm not sure that that happens much.)

Some scripts are well integrated. For example, there are fonts that are
encoded as hacks on Unicoded Thai.

Tamil illustrates some of the problems. When SSA was added in Unicode
4.1.0, one could not go out and use it with all applications. Uniscribe
refused to combine it with Tamil vowels, let alone form the 'shri' ligature.
Imagine finding that the only way to have your name displayed was to
misspell it!

At least now one can get around the Uniscribe limitation for HTML on Windows
if you are desperate. Deer Park supports Graphite , which allows one to
specify one's own Indic re-arrangement etc, and Graphite comes with a good
tutorial to get you started. Graphite does not seem to allow the
pixel-level control of positioning available in OpenType lay-out tables.

Of course, it would be nice if Uniscribe could allow a font to opt out of
its automatic re-ordering and do its own thing. I think this would still be
in accord with the principle of not needlessly duplicating information.

Richard.

Next message: Erkki Kolehmainen: "Re: Frequent incorrect guesses by the charset autodetection in IE7"
Previous message: Samuel Thibault: "Re: Frequent incorrect guesses by the charset autodetection in IE7"
In reply to: Philippe Verdy: "Re: Frequent incorrect guesses by the charset autodetection in IE7"
Next in thread: Sinnathurai Srivas: "Re: Frequent incorrect guesses by the charset autodetection in IE7"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jul 17 2006 - 19:28:15 CDT