Suss out spam networks
I was running down splogs that refer to one of my other sites (nativecelebs has good serps and content, so often included on junk sites).
And I found one blog that I suspected was part of a splog network, with blogs living in various folders. But the root didn’t work, so I turned to Google.
And found this:
It’s a list of URL’s that point (or pointed at one point in time) to one particular IP number.
And it looks like it’s part of a blocklist that susses out splogs. Have a look!
And I was absolutely right. The domain is part of a scraper splog network.
About a2b: They’ve now added nofollow to the spam URL’s, so I lost the nofollow on the links!
Oh, and the spammer behind the splogs? He’s best known as peteinoz on forums and such. He owns wpburner, catbcreator, completeadsensetoolkit and probably a few others. All splog creation, posting and pinging tool. Matt, are you there with your red crayon? These need banning. The guy talks openly about splogging. I’m debating whether or not I’ll post the URL to his forum, so you can see for yourself. Hmmm…
Actually, most of the URL’s on that blocklist should be banned. Most are probably generated with wpburner or similar.
Hi Ann Elisabeth,
Many thanks for making a mention of our splog blocklist on your blog. Thanks also for alerting us to the nofollow issue.
I thought you and your readers might like to know how we pull the list of blocked IP addresses together.
Firstly, at A2B (see http://www.a2b.cc for more) we run a search engine with a ping interface. Bloggers who have geo-located META tags in their HTML (see http://www.a2b.cc/help-searching-addurl-blogping.a2b for more) can ping us and we’ll pick up (parse) their page and index it in our geosearch engine. We receive pings from many individual bloggers, a full ping feed from pingomatic.com, and bulk pings from several other sources, usually around 700,000 to 1 million pings per days. With approximately 200 IP addresses in the blocklist, about 37% of daily pings are blocked immediately.
To generate the list, we recorded the URL of each website (read blog) we were pinged with and also converted it into the IP address of the web server for each URL. We recorded the IP address and added to a counter every time we received a ping for a URL on the same web server. We soon noticed that we were getting many thousands of pings for the same IP addresses, so pulled a script together to list the top IP addresses by number of pings.
We built another script which showed all the URLs associated with each IP address. In order to decide which web server IP addresses to block we open this script and manually have a look at a random sampling of the URLs - it’s usually pretty easy to tell if they’re splogs as they’re just full of advertising links or are quite random in their choice of subject matter. Any web server which has real blogs tends to stay off the blocklist (so that rules out blogspot.com even though people are using it for splogging).
As soon as we’d blocked the first 112 IP addresses, the amount of traffic we were using parsing blogs dropped from 27GB per day (it was so high that it was costing us money in hosting charges) to 6GB/day. Of course, it began to creep up again soon after, so we’re realising it’s an ongoing effort and are beginning to think about blocking whole ranges of IP addresses.
We’re also going to put up a text-only list in IP address order - with any luck it’ll be ready in the next day or so.
I really hope that someone takes some action against sploggers on the hosted blogging services soon though, as it’s costing us money on our hosting and it’s a waste of time - we’re a two-man show and don’t have the time to keep writing scripts to fight it.
Keep up the fight! Best wishes, Sam
[...] 006 to Uncategorized
Spam Huntress Ann Elisabeth has given our splog blocklist a mention on her blog. Thanks Ann Elisabeth! I left a comment explaining how we put the [...]
[...] sting services only shows some of the sites that are at their respective IP addresses. via SpamHuntress, see also comments there from Sam Critchely of A2B. [...]
[...] Spamhuntress » Blog Archive » Suss out spam networks (tags: slogs a2b solution) [...]