Re: Diacritic and similar foldings and spam filtering

From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Jul 08 2004 - 17:46:31 CDT

Next message: Mike Ayers: "RE: Diacritic and similar foldings and spam filtering"

Previous message: Kenneth Whistler: "Re: Diacritic and similar foldings and spam filtering"
In reply to: Doug Ewell: "Re: Diacritic and similar foldings and spam filtering"
Next in thread: Kenneth Whistler: "Re: Diacritic and similar foldings and spam filtering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 08/07/2004 23:22, Doug Ewell wrote:

>Peter Kirk <peterkirk at qaya dot org> wrote:
>
>
>
>>António suggested a serious point that for more comprehensive spam
>>filtering an enhanced folding might be useful, including such foldings
>>as | > I (capital i) and l (small L), 0 (zero) > O, |\/| > M. Would
>>such foldings in fact be feasible and useful? They would have to be
>>part of a general similar shapes folding.
>>
>>
>
>They might be useful for certain applications, in specific situations,
>but Unicode should not ever try to get entangled in this business of
>mapping unrelated characters on the basis of glyph similarity alone.
>It's just too font-dependent and subjective.
>
>See the sub-heading "Spoofing" in TUS 4.0, Section 5.19 "Unicode
>Security," pp. 141-142 for more information.
>
>
>
Thank you for pointing me to this section. This is a useful discussion
which shows clearly why spoofing cannot be avoided by identical encoding
of confusables. (And I am glad to see some clearer terminology than I
had been using.) But it doesn't address my point that UTR #30 folding
can be useful in this area, in providing a framework for what might be
called "confusable folding".

But I think I agree with you that Unicode should not get into detailed
listing of confusables, because it is too font-dependent and subjective.
This kind of thing is best left as a user definable folding.

Actually I am unclear from UTR #30 whether this is supposed to be a
framework for user definable foldings or should be restricted to the
defined list of foldings; the existence of "Foldings based on tailored
collation data" suggest that foldings can at least be tailored, but
there are no further details of how such foldings are covered by the UTR.

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: Mike Ayers: "RE: Diacritic and similar foldings and spam filtering"
Previous message: Kenneth Whistler: "Re: Diacritic and similar foldings and spam filtering"
In reply to: Doug Ewell: "Re: Diacritic and similar foldings and spam filtering"
Next in thread: Kenneth Whistler: "Re: Diacritic and similar foldings and spam filtering"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jul 08 2004 - 18:19:34 CDT