Thanks for the ideas. What you describe is close to what I was planning
to do for the advanced settings, including the help file page with
descriptions of the ramifications of each choice. The labels I gave in
my example were not what I was suggesting as actual labels - that is
just what the options mean in reality (e.g. #1 and #2 require newest
browsers). But it is pretty hopeless to solve the general problem that
way. Your design is great for a well-educated technical person who may
not be familiar with this exact issue, but has a good head for software.
I've spent several years doing usability studies of real consumer and
corporate users. I think you are overestimating the average
non-technical person's tolerance for jargon and technical details. I'm
not against having buried options (even registry entries or config
files!) for the technical user, but expecting any normal person to have
any patience for having to mess with "encodings" is asking for it. It is
the kind of thing where people do not accept any explanation - it should
just work, and if it doesn't work, then the software is at fault.
For example, in your text you say, "This will not display some
characters on old browser without Unicode support.". Right away you've
lost the majority of people. Unicode? What is that? Browser? How do I
know what an old one is, and which ones won't "characters" show up on?
By the way, what's a "character"? What part of my web page won't be
displayed? Explain to me why your software doesn't work properly...
In testing controls like the one you described, I've found people fail
utterly to use them if they are not serious technical people. This is
why I am looking for some other ideas, and to get a feel for what other
designers are doing.
As an aside, you're leaving out most of the rest of the world with your
design. There are many other encodings (Shift-JIS, JIS-0208, EUC-JP,
Big5, GB2312, KOI8-R, etc) that are in use that people in each local
market expect our software to support. So it gets a little more
The sad fact is that if we default to a solution like #1, then we invite
a huge number of calls to technical support asking why things don't look
the same in the browser as they did in the authoring tool. Each call
averages something like a $25 cost, which very rapidly reduces profit
and hence the whole point of making the software. It's not an easy
decision, and if you do the math, it's a lot of money.
I really hope new browsers proliferate quickly so it will be possible to
default to this setting soon. At the moment, defaulting to #1 would
cause trouble for more absolute numbers of people than using the illegal
encodings (#4) does. Is anyone ready to take the plunge and break
backward compatibility _by default_ in order to conform to the emerging
From: Unicode Discussion [SMTP:email@example.com]
Sent: Monday, July 07, 1997 8:14 PM
To: Multiple Recipients of
Subject: Re: Usage of CP1252 characters on www.msnbc.com
Chris Pratley wrote on 1997-07-08 02:33 UTC:
> Although a configurable option is a possible solution, we know
> typical user (representing around 95-98% of users) never
> defaults in a program, especially something as obscure as
> options. As you may know it is very popular to attack
Microsoft for "UI
> bloat", and this would no doubt add to that IMHO. But assuming
> options, "which one do you default to?" is the $64000
Well, it certainly will not do any harm to offer all possible
options in a somewhat hidden way, say by allowing to select the
in the Windows Registry or some configuration file. This would
least allow people like MSNBC who have already identified and
the problem to make the appropriate switch in a minute instead
having to "work hard on a fix for the problem". In the MSNBC
the optimal choice is certainly Latin-1 downconversion.
> If you did have options, you could label the options you list
> a) compatible with 1997 browsers and later
> b) compatible with 1997 browsers and later
> c) modify contents of document to be readable in all
> Warning: some contents may appear different from your original
And noone would understand any more what these options are
It is not possible to understand the difference between these
if they are not labeled with precise terminology (Unicode,
character reference, ISO 8859-1, etc.). The label texts you
are a user interface nightmare that I have encountered much too
on Windows system: By suppressing precise vocabulary, you give
inexperienced user the impression that she knows what is going
(without actually affecting in any way the level of
while giving at the same time the expert user a very hard time
figuring out what these "user friendly" options stand for.
The user interface that I would prefer is:
Character Set Compatibility Options
Advanced configuration: You normally do *not* want to change
settings unless you have a specific requirement for the way
Windows specific characters are represented such that they can
processed on old or non-Windows browsers.
How shall Windows encode CP1252 characters in the code range
that are not part of ISO 8859-1, the classical HTML character
(e.g., the smart quotes and the trademark sign)?
1) Use Unicode numerical character references: this is the
follows strictly the HTML standard. This will not display
characters on old browser without Unicode support.
2) Use Unicode UTF-8: this is a modern more compact encoding
strictly the HTML standard and allows easier editing on
systems. This will not display some characters on old
without Unicode support.
3) Use only ISO Latin-1 characters: Replace some Windows
characters by similar replacements that are guaranteed to
be displayable on even the oldest Web browser.
4) Use native Windows character set (CP1252): This option will
all characters such that they are correctly displayed on
oldest Windows browser, but most likely not on other
Use this option only when you know that only Windows
will view the file (e.g., on Intranets) and Option 1) is
acceptable because some of them are old pre-Unicode
that have not yet been updated.
Default is 1), if you get complains from people with old
we recommend 3) except if you do not want characters to be
and are sure that all browsers are running on Windows, in
case we recommend 4). Option 2) is available for special
and experimental purposes, we recommend not to use it unless
that you want a UTF-8 file in order to edit it on another
If you are concerned about the default, you can still implement
menue now (such that customers like MSNBC can select option 3)
option 4) as a default at the moment. Two releases later, you
1) the default when 95% of your customers have Unicode browsers.
If you are concerned about the amount of text, you can easily
all of this into a help screen easily accessible from the menue.
> Now, if your competitor offered this option:
> d) Compatible with all browsers used _in your company_
> you would have a hard time competing. (Note the emphasis on
> company" in the fourth option, meaning the customer's company.
> even go on to say "most browsers on the Internet", but that
got me in
> trouble last time :-))
> Erik raised an option of writing the actual byte value of the
> in the file. It was my understanding that this can cause
trouble in some
> Unix servers that are not expecting byte vales in the
> Can someone comment here?
If you check my reply again, you'll find that I also suggested
same solution there, too (see option 4 above):
>> - output directly in CP1252 bytes (not NCR!) and make sure
>> IANA registry contains a reasonable MIME entry for CP1252
>> the HTTP server will announce CP1252 as the encoding.
It is not really in the interest of finding a simple common
among all plattforms, but it is formally better than using the
I would be surprised if Unix servers have problems with bytes in
C1 range. They should normally just pass these values on
Markus G. Kuhn, Computer Science grad student, Purdue
University, Indiana, USA -- email: firstname.lastname@example.org
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT