Keeping Google out of stats pages
With referrer spam still being a problem, there are steps you can take to at least make the spammers’ efforts in vain.
Eric from phpfreaks.com posted a link to his stats pages in a comment on my blog. Which means Google won’t be far behind, following the link and indexing his stats pages. Which is not what he had in mind, he just wanted to show me the severity of his referrer spam infestation.
Here’s what he could do, to at least cheat the spammers of the payoff. Put this in the robots.txt file:
User-agent: *
Disallow: /stats/
User-agent: ia_archiver
Disallow: /
The second little thing there is for The Wayback Machine, which I think is a nuisance. I’m not comfortable with copies of my sites ranging back several years, so I always disallow their spider. That’s optional of course. Some people like it. And most of us like it when others haven’t banned it, and we can see what their sites looked like years ago, or maybe even a former owner of the domain name. Quite interesting, but not something I want my own site to support…
Not all search engine spiders obey these directives. But the important ones do, those the spammers are after. So, mission accomplished.
If, on the other hand, you don’t want to tell the bad guys about your stats pages (and some of them definitely check the robots.txt for goodies, you could block the bots instead via .htaccess in that specific directory. Here’s how you can do that. And remember to periodically check for bot name changes, and others you may want to add:
SetEnvIf User-Agent “Googlebot” bots
SetEnvIf User-Agent “msnbot” bots
SetEnvIf User-Agent “Yahoo! Slurp” bots
SetEnvIf User-Agent “jeeves/teoma” bots
Allow from all
Deny from env=bots
Spamhuntress, why wouldn’t you just password protect the stats directory? That way, even though their bots access your publicly-available pages, and place links to their (sometimes pretty gross) sites in your logs, the stats pages simply wouldn’t be available given that bots don’t have the password. Your thoughts?
By the way, ScriptyGoddess’ Subscribe to Comments is pretty neat:
http://www.scriptygoddess.com/archives/2004/06/03/wp-subscribe-to-comments/
Diane
Laziness, I suppose. Depends how often you check your stats. How secret they are. Sure, password protecting works. But some like having their stats in the open, and this is one way of letting them do that without aiding spammers.
And as for the pluging, not going to do that. If you leave a comment here and want to see if there are any responses, you can subscribe to the comments feed. I feel that’s enough. I have gotten this suggestion before…