Reducing Spam

I had a thought today about reducing spam in my inbox, since the current mechanism of scoring seems to be falling short lately. The most effective blocking measure I’ve seen is to add the IP addresses of servers sending the spam to a black hole list, but with spam bot nets running rampant, the list of IPs is growing faster than they can be detected. What can we do to speed up detection?

I think a distributed attack will require a distributed solution. I will describe the social networking-like approach to fighting spam. I have written no infrastructure for supporting this scheme, and provide this description solely as a starting point for someone with more initiative.

This idea is probably just a remash of some existing methods out there. We know that web scrapers regularly crawl our sites, searching for e-mail addresses. Some people generate web pages with hundreds or thousands of invalid addresses in order to clog the spammers’ servers. Bot nets negate the effectiveness of these tactics by having thousands of spam servers (usually compromised home, government, and corporate PCs).

I believe a more effective means of reducing spam would be to actively identify the spam bots and add them to your server’s black hole list. Let’s start with the basic approach.

Add a fictitious (bait) e-mail address to one or two of your web pages, but make the address look legitimate, like todd.wilson@noscience.net. If the address is generated with an algorithm, once this defense becomes popular, spammers might be able to detect the generated addresses. You can make the address invisible to prevent your human readers from seeing it. Within a few weeks, the address will be on quite a few spam lists. On your server, configure the fictitious address to forward to /dev/null. Create a cron job that scans your maillog every few minutes for e-mail sent to the fictitious address. Record the IP address of the sender and the time of the message and add the IP address to your black hole list. It is probably wise to automatically expire IPs from your list after about 14-30 days. It’s possible that spammers will learn to rotate addresses over time, so the expiration times may need to be extended as the tides change. For improved results, you will probably want to have about 10x more bait addresses for your domain than real addresses. You need to balance the number of bait addresses with the capacity of your server to filter spam. If you are able to automatically block IP addresses on your list at your router, you may reduce your bandwidth substantially, but I wouldn’t block the IPs unless I could add and remove them automatically.

This mechanism will provide limited reduction in spam for a single domain. To improve the mechanism, you may want to collaborate with 10-20 friends and associates, creating a trusted network. The servers can be configured to transmit their black hole lists and timeouts to each other a couple times a day, or share a single database server in real time.

Working as a group should increase the effectiveness of the black hole list. To take this a step further, multiple groups can collaborate. In my opinion, groups should be limited to about 20 trusted servers. If it is found that someone in one of the groups is working to clog the list or identify the bait addresses, the damage will be limited to the group having the rogue member. Once discovered, the other groups can drop IPs added by the infiltrated group.

Once the groundwork is laid down, thought can be given to a scoring scheme for groups trustworthiness, and perhaps automatically extending the time particularly bad IP addresses spend on the list.

Posted by Isaac at 7:00 AM on November 13, 2006

Isaac's Diversion

Tags

Reducing Spam