From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Jul 09 2004 - 02:59:52 CDT
On 09/07/2004 00:01, Kenneth Whistler wrote:
>Peter Kirk said:
>
>
>
>>I made a serious point, not apparently made in the UTR draft, that
>>diacritic folding may be useful for spam filtering and similar
>>applications including finding misleading URIs.
>>
>>
>
>This seems like a reasonable point to make and to add to the discussion
>of folding in UTR #30.
>
>
>
>>António suggested a
>>serious point that for more comprehensive spam filtering an enhanced
>>folding might be useful, including such foldings as | > I (capital i)
>>and l (small L), 0 (zero) > O, |\/| > M. Would such foldings in fact be
>>feasible and useful?
>>
>>
>
>Well, someone could try, I suppose, but this stuff tails out pretty
>rapidly into mind-boggling complexity, ...
>
Indeed. I wouldn't suggest going beyond the clearly shape-based. But it
is hard to know where to draw the line, which is another reason to add
to /|/|ike's good ones for not trying to standardise this. But this kind
of approach based on UTR #30 may still be helpful for spam filtering
developers.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Fri Jul 09 2004 - 03:00:34 CDT