Chris Pratley wrote on 1997-07-08 02:33 UTC:
> Although a configurable option is a possible solution, we know that the
> typical user (representing around 95-98% of users) never changes
> defaults in a program, especially something as obscure as encoding
> options. As you may know it is very popular to attack Microsoft for "UI
> bloat", and this would no doubt add to that IMHO. But assuming we have
> options, "which one do you default to?" is the $64000 question.
Well, it certainly will not do any harm to offer all possible
options in a somewhat hidden way, say by allowing to select the option
in the Windows Registry or some configuration file. This would at
least allow people like MSNBC who have already identified and understood
the problem to make the appropriate switch in a minute instead of
having to "work hard on a fix for the problem". In the MSNBC case,
the optimal choice is certainly Latin-1 downconversion.
> If you did have options, you could label the options you list as:
> a) compatible with 1997 browsers and later
> b) compatible with 1997 browsers and later
> c) modify contents of document to be readable in all browsers.
> Warning: some contents may appear different from your original document
And noone would understand any more what these options are about.
It is not possible to understand the difference between these options
if they are not labeled with precise terminology (Unicode, numeric
character reference, ISO 8859-1, etc.). The label texts you suggest
are a user interface nightmare that I have encountered much too often
on Windows system: By suppressing precise vocabulary, you give the
inexperienced user the impression that she knows what is going on
(without actually affecting in any way the level of understanding),
while giving at the same time the expert user a very hard time
figuring out what these "user friendly" options stand for.
The user interface that I would prefer is:
Character Set Compatibility Options
Advanced configuration: You normally do *not* want to change these
settings unless you have a specific requirement for the way certain
Windows specific characters are represented such that they can be
processed on old or non-Windows browsers.
How shall Windows encode CP1252 characters in the code range 128-159
that are not part of ISO 8859-1, the classical HTML character set
(e.g., the smart quotes and the trademark sign)?
1) Use Unicode numerical character references: this is the encoding that
follows strictly the HTML standard. This will not display some
characters on old browser without Unicode support.
2) Use Unicode UTF-8: this is a modern more compact encoding that follows
strictly the HTML standard and allows easier editing on some Unix
systems. This will not display some characters on old browser
without Unicode support.
3) Use only ISO Latin-1 characters: Replace some Windows specific
characters by similar replacements that are guaranteed to
be displayable on even the oldest Web browser.
4) Use native Windows character set (CP1252): This option will encode
all characters such that they are correctly displayed on even the
oldest Windows browser, but most likely not on other platforms.
Use this option only when you know that only Windows browsers
will view the file (e.g., on Intranets) and Option 1) is not
acceptable because some of them are old pre-Unicode versions
that have not yet been updated.
Default is 1), if you get complains from people with old browsers,
we recommend 3) except if you do not want characters to be changed
and are sure that all browsers are running on Windows, in which
case we recommend 4). Option 2) is available for special applications
and experimental purposes, we recommend not to use it unless you know
that you want a UTF-8 file in order to edit it on another platform.
If you are concerned about the default, you can still implement this
menue now (such that customers like MSNBC can select option 3) and use
option 4) as a default at the moment. Two releases later, you make
1) the default when 95% of your customers have Unicode browsers.
If you are concerned about the amount of text, you can easily move
all of this into a help screen easily accessible from the menue.
> Now, if your competitor offered this option:
> d) Compatible with all browsers used _in your company_
> you would have a hard time competing. (Note the emphasis on "in your
> company" in the fourth option, meaning the customer's company. You could
> even go on to say "most browsers on the Internet", but that got me in
> trouble last time :-))
> Erik raised an option of writing the actual byte value of the characters
> in the file. It was my understanding that this can cause trouble in some
> Unix servers that are not expecting byte vales in the 0x80-0x9F range.
> Can someone comment here?
If you check my reply again, you'll find that I also suggested the exactly
same solution there, too (see option 4 above):
>> - output directly in CP1252 bytes (not NCR!) and make sure that the
>> IANA registry contains a reasonable MIME entry for CP1252 and that
>> the HTTP server will announce CP1252 as the encoding.
It is not really in the interest of finding a simple common denominator
among all plattforms, but it is formally better than using the
I would be surprised if Unix servers have problems with bytes in the
C1 range. They should normally just pass these values on transparently.
-- Markus G. Kuhn, Computer Science grad student, Purdue University, Indiana, USA -- email: firstname.lastname@example.org
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT