How You Can Stop Dirty Feed Scrapers In 3 Easy Steps
Stealing is wrong; but some people just don't seem to get it when it comes to intellectual property. Some of my posts take a few hours to write. It's just plain annoying when people steal my work. I'm sure that you feel the same way.
Now, normally, I don't call out spammers — but since this fine individual also decided to "syndicate" both Matt Cutts (spam assassin extraordinaire) and SEO Black Hat (spammer extraordinaire), I will document exactly what he is doing, and how to stop it. (Yes, someone is stupid enough to steal content from Matt!)
All of us who use WordPress automatically generate web feeds. Feeds provide the same information as our web pages — but in common XML-based format so that applications such as feed readers can process and aggregate information from various sources. By default, WordPress provides the full content of the post in its feeds.
Unfortunately this also permits convenient access to spammers seeking content to use in their spamming enterprises. Though it's possible to remove the full content from the feed — and only provide excerpts, this makes your feed less useful. Thankfully, most spammers aren't too bright, and access your feed from the same IP address as the spam web site it is posted on. So here's how to block them:
1. Get the IP of the web site that is stealing your content.
%ping www.trafficboosterpro.com
PING trafficboosterpro.com (>74.52.58.162): 56 data bytes
2. Search your logs for that IP address (via SSH).
%cat www.20061231 | grep ">74.52.58.162"
74.52.58.162 - - [31/Dec/2006:01:00:38 -0500] "GET /blog/feed/ HTTP/1.0" 200 49330 "-" "TrafficBoosterPRo (+http://TrafficBoosterPro.com/)"
3. Place the following directives in your .htaccess file.
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^>74.52.58.162
RewriteRule ^.*$ - [F]
Done! Now this cheesy spammer selling cheesy black hat products (mind you, they wouldn't even work), can't steal my content anymore. Good riddens
Posted on January 2nd, 2007 by Green Guy
Filed under: Uncategorized





