Keeping a guestbook spam free isn’t easy
I needed a new guestbook a few months ago, for a website I created. With my background, I of course knew how bad it could get, so I wanted to reduce the amount of moderating I needed to do. First I chose a guestbook that was created by someone who understood spam. He claimed it would kill spam effectively before it could even be posted.
Yeah, I’ve noticed an AOL member trying several times before her post got through…
But I also figured that if the guestbook couldn’t be found in Google, it wouldn’t be spammed. So I used robots.txt and even put nofollow on links to it. Problem was, I had forgotten about one or two, that didn’t have nofollow on them. Consequently, the link shows up in Google, without a blurb or cache. But it’s still named “guestbook”.
So I get the occasional spammy comment.
I have an idea for Googlebot (Matt, are you listening?):
If the site the document is on has ONLY nofollow links to it, you should not count links coming from OUTSIDE the same domain as an OK to display the link in Google. If the site ITSELF means for it to stay outside of Google, that should count as a NO! We can NOT police every link from outside. Granted, this time I was the one who made the error (put a link on a forum and on another site I owned. Both errors have been fixed now), but I’m sure you can see the potential for mischief here?
December 31st, 2006 at 8:08 am
Are you sure your robots.txt is correct? I would think putting the page in robots.txt is sufficient no matter who links to your site.
December 31st, 2006 at 9:16 am
Thats gonna have NO effect on stopping gb spam…. sorry it wont…
January 4th, 2007 at 8:42 pm
The robots.txt should work… at least if it was set before any links to the file. They say that it is checked only once a day. Links with nofollow just don’t count in popularity classements such as Google’s PR (and hence avoid good ranking). Using Google Sitemaps shouls also be of interest (if it can be automated). Another right method to prevent robots from getting a page in their index is to set a meta robots (or just googlebot) to noindex, nofollow in page’s HTML header.
Systematically setting by default links in guestbooks or forums or blog comments (any user-given links) with the rel-nofollow attribute (maybe with a possibility to) is also recommended - it avoids spammers trying to get popularity through links, particularly if you say it clearly.
January 5th, 2007 at 9:54 am
Blocking rackspaces:
Theplanet: 70.84.0.0/14, 74.52.0.0/16
Everyones Internet: 207.44.128.0/17, 64.246.0.0/18, 66.98.128.0/17
would help a little - almost all POST requests to my “guestbook” are coming from these IPs.
January 5th, 2007 at 10:02 am
After a guestbook has been “compromised” I recommend renaming a directory it contains, put new directory in robots.txt and put [META NAME=”ROBOTS” CONTENT=”NOINDEX,NOFOLLOW”] in guestbook template files.
For skilled coders I would recommend deleting all “user www” fields.
January 9th, 2007 at 8:36 am
Here’s an idea that would stop spam forever.
http://bloggingpoet.squarespace.com/bloggingpoetcom/2007/1/9/the-anti-spam-reserves-finally-an-end-to-spam.html
I hope you and others will help lead the way.
January 15th, 2007 at 3:33 pm
The meta line Lemat is referring to is here : http://www.google.com/support/webmasters/bin/answer.py?answer=35303&query=meta+no+crawl&topic=&type=
And I suppose once you’ve been ‘discovered’ your undoubtedly on a list of some sort and it won’t matter if your removed from the cache/index or not. As long as their script is able to continue on with its automated POST to your guest book it will probably keep visiting you. One thing you could do is move the folder which contains your guest book, robot.txt & meta tag it to ensure it doesn’t show up on google, and use a javascript meta redirect for your visitors to goto it (I don’t know if spammers can traverse javascript links but its certainly different from just greping for a href tags.
January 19th, 2007 at 8:43 am
Well so long as you have a guestbook with some good anti spam protection you then only have to worry about manually entered spam (yes some people are that sad).
As Lemat says using the noindex meta tag should stop Google listing it regardless of the link they followed.
I have had to rename both my forum and guestbook folders before now. Not because they were getting spammed but because of the amount of attempts to spam them.
Now I have a pretty good htaccess file to block access from certain places such as layeredtech, asianet.co.th, seamnetworks.net and so on. I also check my logs regularly to see if I am getting a stupid amount of hits from anywhere.
January 21st, 2007 at 6:23 am
another simple solution: let’s assume there is a web form with
[input type=”text” name=”email”]
and php code for it:
$email=$_POST[’email’]
solution:
[input type=”text” name=”emailxyz”]
and php code for it
$email=$_POST[’emailxyz’];
where xyz is some random text.
If the webform gets spammed again - change the text.
Spammers must be desperate to handle multiple versions of webforms, and cannot adapt so fast.
BTW. I have a addentry.php script without any webforms and it gets spammed few times a day - spammers didn’t even bother to check it.
January 29th, 2007 at 4:29 am
in my experience, htaccess is great protecting all kind of script/form/etc from spam.
here, i also list few great spam fighting links.
http://it.dennyhalim.com/2007/01/close-to-perfect-htaccess-ban-list.html
January 29th, 2007 at 5:01 am
I did actually think about randomly assigned input names but given that some spammers have their programs actually visit the page first to get the captcha I decided it was a waste of resources in the end.
February 2nd, 2007 at 5:26 pm
matt isnt listening, he’s mastrubating like every day… or every hour.
dont be such a whiny little bitch and just take the load.
February 21st, 2007 at 11:01 am
Hi,
Why not make a guest book that works only after the spammer has completed a payment of $1 through paypal?
You can add a note on the guest book that genuine visitors will receive back their money after having checked he is not a spammer.
You can even add a note “Spammers, you are all welcome, prepare your credit card.”
March 22nd, 2007 at 4:52 pm
I assume it will have small effect. In fact, when I analyse the hits to my robots.txt, I find a lot of suspicious looking hits. Some (non search engine) bots seem to target the robots.txt file to retrieve hidden content. I would expect some of them to look for guestbooks.
I am not a friend of those word - picture verifications, because you usually run it on a seperate server and are dependend on that server, and I just hate to type those stupid combinations that you can hardly read.
I would get about 70-100 spam entries A DAY without spam protection. I am on a multiple step strategy to prevent spam. The first thing that I recegnised about spam bot is, that they never filled the date/time stamp of my guestbook. So I changed my php script to look up, whether the date is filled to sort out spam. After 2 month with a 100% success rate (of correct filtering) I switched of the email notification that I used to verify the script.
Earlier this year the first spams came through again, this time filling the date field with stupid text. I got rid of that by checking for the years 2007 to 2011 (yes, I am too lazy to change it every year …).
Alternatively I could think about a hidden field having a check value. Spam bots do not seem to use the “post” button.
What I did not find up to now is a forum, group or similar to exchange tactics and to involve the search engine providers in a anti spam strategy.
At the moment the spam entries just go to nirvana. For the future I would like to collect them seperately.
Now if you find a group of webmasters who would participate to:
- develop a good verification picture free strategy against spam ebtries
- realise and publish them in Java script, php, perl etc. as freeware
- collect spam entries in in seperate “spambooks”
- convince the search engine providers to use those spambooks to DOWNRATE websites
Maybe that would be a starting point to reduce those spam entries for the future.
If anyone would like to join in, knows a good group or forum for that, I would be happy to get an information. My email can be found on my website. Please feel free to contact me. if someone is interested, you can find my guestbook at http://www.hereiam.de/NZ/output.php
Contact me for the php code
May 28th, 2007 at 6:28 am
You’re right about not letting your guest book appear in Google - that will prevent a number of the spam robots from finding it. Not all, however. I know of a couple of spamming bots that do their own web crawling.
I’ve put quite a lot of effort into defeating the spamming agents, and none of the four guest books I manage now get spammed. The most useful facility that I’ve incorporated is to ensure that the Add Entry page is only ever entered from the View Entry page, using a PHP session variable. All the bots fail at that stage.
See http://www.braemoor.co.uk/software/antispam.shtml for more details.
July 6th, 2007 at 9:18 am
I was getting 2 or 3 spam entries on a good day. Then there were the bad days. I shut down the guestbook for awhile until I came up with a new solution. I have received zero spam with the new system.
Essentially, it is a sign-by-invitation process. A potential signer must first provide a valid email id. They are then sent an email invitation with a unique link, valid for a single use and only valid for 24 hours. Using this link, the entry can be submitted. The submission is not active until I review it, although I haven’t had to reject a single entry yet.
July 18th, 2007 at 9:09 am
Lemat - Changing the form fields will only stop those who have harvested your comment form previously. I see that this occurs in only 28% of my entries. The rest are actually hitting the page with the comment form just before hitting the POST receiving page. In other words, this method would only stop a portion of the spam and may not be worth the hassle.
As for an exclusion in robots.txt, unfortunately you’re still going to be a target for the disobedient bots who deliberately scan robots.txt. I recently placed a honeypot in a page referenced by my robots.txt to track hits this way. Nonetheless, at least you’d prevent the easier Google-method for site cultivation. With the honeypot method, if an IP hits your honeypot as well as your guestbook, then the guestbook entry can be deleted immediately.