RE: Usage of CP1252 characters on www.msnbc.com

From: Chris Pratley (chrispr@microsoft.com)
Date: Mon Jul 07 1997 - 20:29:20 EDT


I'm sorry you took that meaning from my mail. I certainly wasn't trying
to be offensive, and I certainly was not trying to blame anyone for the
current situation. I think you are unfairly assigning to me some
preconceptions you may have about this issue.

I was raising a real customer issue. When I say "customer", I could be
talking about people that have paid money to Microsoft, and run
Microsoft software in their business. In this context, it is reasonable
to say "all of the customer's machines", since in this hypothetical
example this is in fact a Microsoft customer running Windows we are
talking about.

You can extend the statement to customers of other companies. Many
software companies have customers that use exclusively Windows machines,
and for these companies, the statement also holds true.

My statements were from a customer perspective. If I, as a customer, am
setting up an Intranet/Internet solution, I have some goals:
1) Maintain all existing data
2) Avoid any data corruption
3) Serve clients (can be Intranet user, or external people visiting
the corporate web site)

The reality for many corporations is that they will not accept a
statement like "I'm sorry that our new product breaks your currently
functioning solution, but we have to do this to conform a standard that
didn't apply to your activities before". It would be extremely arrogant
to tell a customer that they have to break their current solution for
any reason whatsoever except to enable a better solution for them, and
even then they want the option. If you have ever had customers yell at
you in the past for "arbitrarily" changing things that break them, even
when it is for their own good, then you will understand the burning need
to do what they want more than just about anything. I think you can see
how in being strict on this point of encodings one is damned if you do
(customer yells at you), and damned if you don't (non-customers yell at
you).

Microsoft software already makes the automatic conversion from these
code points to Unicode when opening such files, as I suggested a Unix
browser might consider. But such characters are not automatically
translated to Unicode NCRs on output, because of the issue I raised. Our
competitors do not do this either, and they defined this behavior in the
first place (writing out &#128-&159). As you said, Microsoft was a
latecomer, and this behavior had already been defined (for better or for
worse).

At the start of the Internet phenomenon, NCRs were not defined to be
Unicode (in fact to my knowledge, this is STILL not a standard
officially, which is why it is an RFC). People had data that used these
characters. There were no named entities for thing like smart quotes.
They had to be round-tripped somehow. I don't think the issues are as
black and white as you claim. In any case, I'm not interested in that
discussion. I'm interested in addressing the issue I posted about.

Do you (or anyone else), have some suggestions on this issue? I think it
is a hard problem to solve, and I was trying to get a sense of what
solutions people were adopting.

Thanks,
Chris
PS. Please remember that I am not speaking for Microsoft.

        -----Original Message-----
        From: kuhn@cs.purdue.edu [SMTP:kuhn@cs.purdue.edu]
        Sent: Monday, July 07, 1997 3:13 PM
        To: Chris Pratley
        Subject: Re: Usage of CP1252 characters on www.msnbc.com

        chrispr@microsoft.com wrote on 1997-07-07 19:44 UTC:
> 3. Write out as &#147 and &#148.
> Oh look, on all of the customer's machines these display just
fine. It
> turns out that virtually all old browsers can understand these
> characters. There is .
> This is a problem for the external web site, but all the home
users they
> are trying to reach can read those characters fine.

        #define FLAME_MODE FIRE_AT_WILL

        Your statement shows again this type of Microsoft arrogance that
I have
        started to love over the years. "all of the customer's machines"
versus
        "a small % that does not (e.g. some Unix browsers)".

        May I remind you that HTML and the Web were developed on
NeXTStep and
        Unix and that e.g. I have been using HTML for almost two years
before
        the first MS-Windows browser was available? Microsoft is a
guest in the
        Web arena, not a Web god. We have well-established standards
and it would
        suit Microsoft well to follow those standards and not to try to
replace
        them with its proprietary standards just because of Microsoft's
inertia.

        Your mail sounded like the C1 problem is after all the fault of
the authors
        of Unix and Mac Web browsers! The contrary is true: Microsoft
could
        have supported fully automatic CP1252 <-> Unicode NCR conversion
right
        from the beginning, and we wouldn't have had this problem today.
I feel
        that now it is your turn to tell your customers that you are
sorry for
        creating this character set mess and that they better shall
follow now
        established standards once Microsoft has finally understood
them.

        #define FLAME_MODE COOL_DOWN

        Pheeew, now I feel better ... ;-) [sorry, nothing personal]

        Markus :)

        --
        Markus G. Kuhn, Computer Science grad student, Purdue
        University, Indiana, USA -- email: kuhn@cs.purdue.edu



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT