RE: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)

From: Doug Ewell (doug@ewellic.org)
Date: Mon Jun 28 2010 - 15:22:22 CDT

  • Next message: Asmus Freytag: "Re: charset parameter in Google Groups (was Re: Indian Rupee Sign to be chosen today)"

    Mark Crispin <mrc plus unicode at panda dot com> wrote:

    > On Mon, 28 Jun 2010, Mark Davis ☕ wrote:
    >> The problem with slavishly following the charset parameter is that it
    >> is often incorrect. However, the charset parameter is a signal into
    >> the character detection module, so the charset is correctly supplied
    >> from the message then the results of the detection will be weighted
    >> that direction.
    >
    > I interpret these two sentences as:
    >
    > "The problem with following the standards is that some people don't
    > follow the standards. So instead of following the standards
    > ourselves, we will guess if the other guy follows the standards or
    > not, no matter how much he claims to follow standards. Too bad if our
    > fix transforms his valid data into garbage."

    At the very least, it would be nice if the charset parameter constituted
    a much stronger signal into the detection module than it apparently did
    in Andreas' case, so that if he says the text is 8859-15, and we already
    know that 8859-15 is nearly impossible to distinguish heuristically from
    8859-1, the module might as well take his word for it.

    I do tend to agree with Mark that the complaint against Google Groups
    (with which I am not affiliated) might have been posted with more
    civility and less invective.

    --
    Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
    RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Mon Jun 28 2010 - 15:26:04 CDT