Fighting Content Scrapers

There are scumbags out there who use automatic bots to scrape content from your RSS feed. More often than not, they use an automatic plugin to go into your RSS feed and steal the article from it. To say it is annoying is an understatement. They steal your content, they (might) outrank you in the searches, and if your website is still developing and not receiving much traffic, Google might finger you as a content scraper and banish you to the non-indexed void. In breif, you probably want to do something about it. So, what can we do? The truth is, there’s quite a few options depending on your situation.

So How Do I Catch These Scrapers?

It can be painful to find whether or not your content is being scraped. Odds are, if you run a medium or large website, you’ll get your content stolen, less with small ones. The first, and most obvious option, is to just Google your article titles online, and yes, it can be frusturating and extremely boring. If you make use of the RSS footer, you can use the Ahrefs backlink checker or verify trackbacks to see if someone’s linked to you recently, although massive websites with thousands of links will struggle in doing this.

Why You Should Do Something

While it may be tempting to do nothing and work on publishing contnet, you should at least consider putting a link in your RSS feed or file a DMCA takedown if you really feel like it. While authority sites, who are at virtually no risk of being outranked or deindexed have nothing to fear, smaller websites may find themselves being outranked in search results or, in extreme cases, marked down as a content scraper and banished to the internet wasteland.

DMCA Copyright Takedown

If your content is copyrighted, you can get the scraper’s website taken down. Yes, completely removed. Most webhosts will take down the site if you complain to them, so look up the offending scraper on Whois, go to the administrative contact section and send a nice note to the email associated, usually that of the webhost since most people use Whois protection. If you don’t want the bother, there are entire compagnies, like DMCA Services who are willing to take on the task for you, although it can be a pricey product, $199 per takedown. But what if you don’t have a copyright? Not everyone does, we certainly don’t, even though we put that cool “©” in the footer. Who doesn’t? But, if you have the luxury of a copyright, the host will most likely comply and take down the website.

Report Them to Google

Being the king of the search engines, Google has the first and last word on whether or not your website will be popular, and if they de-index you, you’re finished. Since it can take a while, and/or cost money to get a site taken down via the DMCA, you can do the next best thing and cut off the offending website’s greatest source of organic traffic. As long as you have a copyright, submitting your copyright complaint takes under five minutes and doesn’t cost $199 per takedown. The reporting form doesn’t exactly have the best looking design, but as long as it works, might as well give it a try.

Google provides a form to report websites that scrape your content.

Keep in mind that you do have to have a copyright on the content, so make sure that’s the case. If you do have a copyright, you may also want to consider complaining to the webhost and getting there site taken down.

Most scrapers steal your content from your RSS feed, a data format used for providing users with frequently updated content. WordPress generates the feed automatically, at yoursite.com/feed. If you write the RSS feed yourself, you can easily add it to the bottom of each article. If you use WordPress, you can use the RSS Footer feature provided by Yoast SEO. If you don’t use Yoast SEO, you can also add it with a little code on your functions.php file.

function wpb_feed_filter($query) {
if ($query->is_feed) {
add_filter('the_content','wpb_feed_content_filter');
add_filter('the_excerpt_rss','wpb_feed_content_filter');
}
return $query;
}
add_filter('pre_get_posts','wpb_feed_filter');
  
function wpb_feed_content_filter($content) {
// Content you want to show goes here 
$content .= '<p>This post originally appeared on <a href="'. get_bloginfo('url') .'">'. get_bloginfo('name') .'</a> and should only appear on <a href="'. get_bloginfo('url') .'">'. get_bloginfo('name') .'</a>.</p>';
return $content;

Internal Linking + Affiliate Links

You know, if they’re stealing your content, you might as well take advantage of it. Since they’re copying directly from your RSS feed, you might as well start internally linking a lot, after all, a backlink is a backlink. If you put an affiliate linkm and someone buys something through it, you’ll be earning that comission, not our fellow content scraper. And if you’re internally linking, and an interested reader clicks on it, you’re getting a new page view. And hey, as we said, a backlink is a backlink. Some people think that these are bad backlinks, considering that scraping sites break Google’s rules, but I’d feel that Google wouldn’t penalize you, and do it anyways. Do keep in mind that scrapers don’t always steal from your RSS feed, so this won’t always work.

Images

Now this is a bit more of a clever one, when scrapers steal your content, they don’t download it and upload it to there web server, they link to your image. In case you’re confused, instead of being:

<img src="myimage.png">

it’ll be:

<img src="https://yoursite.com/myimage.png">

This essentially means that the image is on your server, so there website will go to your website and find myimage.png, than display that. The scraping website is just accessing your server to find the image on it and display it on there site. So, hypothetically of course, if you, say, change the real image to, say, my-image.png, you’ll be free to change the old one, myimage.png, to anything you feel like, and there website will display it!

This is where you can get creative. You could create a 600 x 10 000px image, something that’ll get them deindexed, or anything you want really. If you’re feeling generous, maybe just something that says “This content was stolen from mywebsite.com“. As we’ve said, it’s time to be creative. Please let us know what you’ve come up with, I’d love to hear it.

Use RSS Summaries

As we’ve said, scrapers steal content scrapers (sometimes) steal the content from your RSS feed, and they make the assumption that the assumption that you’ll be putting the full article inside that RSS feed, and not, say a summary with a read more link inside of it, but some people do, and if they steal your content, readers will have to click the link to your website if they want to continue reading.

YARPP

A popular related posts plugin, YARPP places a list of contextually related posts and the end of each article. YARPP allows you to insert those related posts in the RSS feed to, so if scrapers steal the content from the RSS feed, they’ll steal the related posts too. Quite similar to the method of placing an RSS feed footer, but likely to generate a few more visits to your website. And plus, YARPP will encourage your actual readers to read another article, keeping the bounce rate down.

Overall, that’s all I’ve managed to come up with. Yes, scrapers are annoying, but there are plenty of ways to annoy them, and we hope they work for you. If you’ve managed to come up with an idea we missed, we’d be thrilled to here from you, and add it to our article.

1 thought on “Fighting Content Scrapers”

Leave a Comment