<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.7" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Suss out spam networks</title>
	<link>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Mon, 13 Oct 2008 11:45:43 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.7</generator>

	<item>
		<title>by: TrackBacks &#187; Blog Archive &#187; links for 2006-12-09</title>
		<link>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/#comment-74840</link>
		<pubDate>Sat, 09 Dec 2006 04:38:20 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/#comment-74840</guid>
					<description>[...] Spamhuntress » Blog Archive » Suss out spam networks (tags: slogs a2b solution) [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] Spamhuntress » Blog Archive » Suss out spam networks (tags: slogs a2b solution) [&#8230;]
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Hacks and Gadgets by HJL  &#187; Blog Archive   &#187; What (spam) sites are at that IP address?</title>
		<link>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/#comment-2384</link>
		<pubDate>Sat, 14 Jan 2006 20:35:09 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/#comment-2384</guid>
					<description>[...] sting services only shows some of the sites that are at their respective IP addresses. via SpamHuntress, see also comments there from Sam Critchely of A2B. 	 					     				 				 [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] sting services only shows some of the sites that are at their respective IP addresses. via SpamHuntress, see also comments there from Sam Critchely of A2B. 	 					     				 				 [&#8230;]
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: A2B and More &#187; A2B blocklist makes it to Spam Huntress</title>
		<link>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/#comment-2230</link>
		<pubDate>Mon, 02 Jan 2006 22:37:58 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/#comment-2230</guid>
					<description>[...] 006 to Uncategorized				 					Spam Huntress Ann Elisabeth has given our splog blocklist a mention on her blog. Thanks Ann Elisabeth! I left a comment explaining how we put the  [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] 006 to Uncategorized<br />
 					Spam Huntress Ann Elisabeth has given our splog blocklist a mention on her blog. Thanks Ann Elisabeth! I left a comment explaining how we put the  [&#8230;]
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Sam Critchley</title>
		<link>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/#comment-2229</link>
		<pubDate>Mon, 02 Jan 2006 22:32:25 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/12/31/suss-out-spam-networks/#comment-2229</guid>
					<description>Hi Ann Elisabeth,

Many thanks for making a mention of our splog blocklist on your blog. Thanks also for alerting us to the nofollow issue. ;-)

I thought you and your readers might like to know how we pull the list of blocked IP addresses together.

Firstly, at A2B (see http://www.a2b.cc for more) we run a search engine with a ping interface. Bloggers who have geo-located META tags in their HTML (see http://www.a2b.cc/help-searching-addurl-blogping.a2b for more) can ping us and we'll pick up (parse) their page and index it in our geosearch engine. We receive pings from many individual bloggers, a full ping feed from pingomatic.com, and bulk pings from several other sources, usually around 700,000 to 1 million pings per days. With approximately 200 IP addresses in the blocklist, about 37% of daily pings are blocked immediately.

To generate the list, we recorded the URL of each website (read blog) we were pinged with and also converted it into the IP address of the web server for each URL. We recorded the IP address and added to a counter every time we received a ping for a URL on the same web server. We soon noticed that we were getting many thousands of pings for the same IP addresses, so pulled a script together to list the top IP addresses by number of pings.

We built another script which showed all the URLs associated with each IP address. In order to decide which web server IP addresses to block we open this script and manually have a look at a random sampling of the URLs - it's usually pretty easy to tell if they're splogs as they're just full of advertising links or are quite random in their choice of subject matter. Any web server which has real blogs tends to stay off the blocklist (so that rules out blogspot.com even though people are using it for splogging).

As soon as we'd blocked the first 112 IP addresses, the amount of traffic we were using parsing blogs dropped from 27GB per day (it was so high that it was costing us money in hosting charges) to 6GB/day. Of course, it began to creep up again soon after, so we're realising it's an ongoing effort and are beginning to think about blocking whole ranges of IP addresses.

We're also going to put up a text-only list in IP address order - with any luck it'll be ready in the next day or so.

I really hope that someone takes some action against sploggers on the hosted blogging services soon though, as it's costing us money on our hosting and it's a waste of time - we're a two-man show and don't have the time to keep writing scripts to fight it.

Keep up the fight! Best wishes, Sam</description>
		<content:encoded><![CDATA[<p>Hi Ann Elisabeth,</p>
<p>Many thanks for making a mention of our splog blocklist on your blog. Thanks also for alerting us to the nofollow issue. <img src='http://spamhuntress.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>I thought you and your readers might like to know how we pull the list of blocked IP addresses together.</p>
<p>Firstly, at A2B (see <a href="http://www.a2b.cc" rel="nofollow">http://www.a2b.cc</a> for more) we run a search engine with a ping interface. Bloggers who have geo-located META tags in their HTML (see <a href="http://www.a2b.cc/help-searching-addurl-blogping.a2b" rel="nofollow">http://www.a2b.cc/help-searching-addurl-blogping.a2b</a> for more) can ping us and we&#8217;ll pick up (parse) their page and index it in our geosearch engine. We receive pings from many individual bloggers, a full ping feed from pingomatic.com, and bulk pings from several other sources, usually around 700,000 to 1 million pings per days. With approximately 200 IP addresses in the blocklist, about 37% of daily pings are blocked immediately.</p>
<p>To generate the list, we recorded the URL of each website (read blog) we were pinged with and also converted it into the IP address of the web server for each URL. We recorded the IP address and added to a counter every time we received a ping for a URL on the same web server. We soon noticed that we were getting many thousands of pings for the same IP addresses, so pulled a script together to list the top IP addresses by number of pings.</p>
<p>We built another script which showed all the URLs associated with each IP address. In order to decide which web server IP addresses to block we open this script and manually have a look at a random sampling of the URLs - it&#8217;s usually pretty easy to tell if they&#8217;re splogs as they&#8217;re just full of advertising links or are quite random in their choice of subject matter. Any web server which has real blogs tends to stay off the blocklist (so that rules out blogspot.com even though people are using it for splogging).</p>
<p>As soon as we&#8217;d blocked the first 112 IP addresses, the amount of traffic we were using parsing blogs dropped from 27GB per day (it was so high that it was costing us money in hosting charges) to 6GB/day. Of course, it began to creep up again soon after, so we&#8217;re realising it&#8217;s an ongoing effort and are beginning to think about blocking whole ranges of IP addresses.</p>
<p>We&#8217;re also going to put up a text-only list in IP address order - with any luck it&#8217;ll be ready in the next day or so.</p>
<p>I really hope that someone takes some action against sploggers on the hosted blogging services soon though, as it&#8217;s costing us money on our hosting and it&#8217;s a waste of time - we&#8217;re a two-man show and don&#8217;t have the time to keep writing scripts to fight it.</p>
<p>Keep up the fight! Best wishes, Sam
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
