[Esd-l] Spam Filtering

Peter Hanecak hanecak at megaloman.com
Wed Jul 31 00:36:01 PDT 2002


Hello,

On Tue, 30 Jul 2002, Eric Brosius wrote:

> As are most admins, we're getting a little sick of all the spam floating
> around the internet.  I've read though past emails and I'm going to look
> into the links on procmail's website.  But I'm curious to hear what most
> of you are doing to block 'unwantable' words in the subject and/or body
> of messages.  What works best?  Does the sanitizer do it?  What is
> everyone doing about it??  Thanks for sharing the knowledge.

I'm using set simple procmail rules and sendmail's access file to help me 
with SPAM:

1) "for sure" rules: those rules (I hope) are (and have to be) 100% 
without false-positives; they do not catch every SPAM but catch most of 
it; (note: I'm not sorting any messages to /dev/null so there is no 
possibility of losing something and also to have some statistics)

example:

	# some SPAM hase "To" filed set to addresses like
	# Undisclosed.Recipients at our.gateway.com so I know for 
	# sure that this is some "To" faking in progress and 
	# message is SPAM, scum or something along that line
	:0:
	* ^To.*(Undisclosed.Recipients|Money.in.Motion)@our.gateway.com
	mail/spam`date +%y`


2) "almost 100% accuracy" rules: those rules are trying to catch SPAM and 
mostly SPAM but I'm aware that some legitimate messages can be catched by 
those rules (even if possibility is 1:1000); those rules filter messages 
to something I can call SPAM quarantine and I'm looking at this quarantine 
once a day

example:

	# set of rules which catches messages not directed to me - I'm 
	# ommiting them while there are quite a lot of them like:
	#	:0:
	#	* ^TO_.*hanecak at megaloman.com
	#	mail/spam-quarantine
	# false-positives are messages, which are BCCied to me

	# rule to catch those quite "polite" senders of 
	# unwanted advertisment
	:0:
	* ^Subject.*ADV\:
	mail/_spam


3) rest is sorted as "every mailing list has its folder" and rest goes to 
INBOX

4) notorious junk senders are placed in sendmail's access file with 
"ERROR:550 Spammers are banned from our site" and (if that control is 
effective) messages from then are not delivered to me (and 
colegues) anymore


In that way it goes like this (applies to this year):

	1) I received 3340 unwanted junk messages this year (compare to 
	1944 junk messages last year!)

	2) about 6-7 (but sometimes even 20) daily of that is filtered to 
	spam-quarantine which I quickly scan for false-positives and rest 
	move to spam`date +%y`

	3) about 2-4 per week of that make it to my INBOX

	4) about 20 messages per week are catched by sendmail's access 
	file so they are not received


Such system is not that complicated (no AI, no score based filtering, ..., 
...), has some weak points but make it possible for me to work with e-mail.


So now I will enjoy hearing about this from others! :)


Sincerely

Peter

-- 
===================================================================
  Peter Hanecak <hanecak at megaloman.com>
  GPG pub.key: http://www.megaloman.com/gpg/hanecak-megaloman.txt
===================================================================



More information about the esd-l mailing list