RE: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)

From: Doug Ewell (doug@ewellic.org)
Date: Mon Jun 28 2010 - 15:22:22 CDT

Next message: Asmus Freytag: "Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)"

Previous message: Mark Crispin: "Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark Crispin <mrc plus unicode at panda dot com> wrote:

> On Mon, 28 Jun 2010, Mark Davis ☕ wrote:
>> The problem with slavishly following the charset parameter is that it
>> is often incorrect. However, the charset parameter is a signal into
>> the character detection module, so the charset is correctly supplied
>> from the message then the results of the detection will be weighted
>> that direction.
>
> I interpret these two sentences as:
>
> "The problem with following the standards is that some people don't
> follow the standards. So instead of following the standards
> ourselves, we will guess if the other guy follows the standards or
> not, no matter how much he claims to follow standards. Too bad if our
> fix transforms his valid data into garbage."

At the very least, it would be nice if the charset parameter constituted
a much stronger signal into the detection module than it apparently did
in Andreas' case, so that if he says the text is 8859-15, and we already
know that 8859-15 is nearly impossible to distinguish heuristically from
8859-1, the module might as well take his word for it.

I do tend to agree with Mark that the complaint against Google Groups
(with which I am not affiliated) might have been posted with more
civility and less invective.

--
Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s

Next message: Asmus Freytag: "Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)"
Previous message: Mark Crispin: "Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 28 2010 - 15:26:04 CDT