<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.7" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Comment spam trainee?</title>
	<link>http://spamhuntress.com/2005/10/18/comment-spam-trainee/</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Mon, 06 Oct 2008 19:13:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.7</generator>

	<item>
		<title>by: Search Engines Web  :- O</title>
		<link>http://spamhuntress.com/2005/10/18/comment-spam-trainee/#comment-1655</link>
		<pubDate>Tue, 25 Oct 2005 20:48:23 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/10/18/comment-spam-trainee/#comment-1655</guid>
					<description>///// ALWAYS check out those google referrers with no search terms!



What does a GOOGLE referrer and an MSN referer with No search terms mean? Are they in fact falsly created referrer strings?  Does certain information in a referrer NEUTRALIZE duplicates - in other words, creating a false referrer in a string eliminate the default info due to the limitations of server log technology?

Those two Search Engines seem to produce a relatively large number of them - but Yahoo seldom does.

When doing a "feel lucky" search - the referrer keywords did in fact appear</description>
		<content:encoded><![CDATA[<p>///// ALWAYS check out those google referrers with no search terms!</p>
<p>What does a GOOGLE referrer and an MSN referer with No search terms mean? Are they in fact falsly created referrer strings?  Does certain information in a referrer NEUTRALIZE duplicates - in other words, creating a false referrer in a string eliminate the default info due to the limitations of server log technology?</p>
<p>Those two Search Engines seem to produce a relatively large number of them - but Yahoo seldom does.</p>
<p>When doing a &#8220;feel lucky&#8221; search - the referrer keywords did in fact appear
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: jon</title>
		<link>http://spamhuntress.com/2005/10/18/comment-spam-trainee/#comment-1622</link>
		<pubDate>Fri, 21 Oct 2005 15:11:07 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/10/18/comment-spam-trainee/#comment-1622</guid>
					<description>Wow so my trackback spam didnt work?</description>
		<content:encoded><![CDATA[<p>Wow so my trackback spam didnt work?
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Paulo</title>
		<link>http://spamhuntress.com/2005/10/18/comment-spam-trainee/#comment-1613</link>
		<pubDate>Tue, 18 Oct 2005 16:15:49 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/10/18/comment-spam-trainee/#comment-1613</guid>
					<description>I've found that "google.com" also comes up as a referrer when someone gets to your site via a click on "I'm Feeling Lucky," so you really have to check the originating server.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve found that &#8220;google.com&#8221; also comes up as a referrer when someone gets to your site via a click on &#8220;I&#8217;m Feeling Lucky,&#8221; so you really have to check the originating server.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Olliver</title>
		<link>http://spamhuntress.com/2005/10/18/comment-spam-trainee/#comment-1612</link>
		<pubDate>Tue, 18 Oct 2005 13:59:36 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/10/18/comment-spam-trainee/#comment-1612</guid>
					<description>I've seen quite a lot  of these testing bids over the past years, the "Google check" only one of them. Other variants may include: 
an obviously faked or empty user agent string from the ip range of hosting companies (like ThePlanet and EV1)
continiously changing ip addresses (typically open proxies) with the same user agent using hitting a particular page normally not requested very often
visitors hitting the site with continously changing user agents (switching between bots and browsers)
visitors with browser user agents making HEAD requests
A couple of entries in Apache's error log complaining about HTTP violations, like sending HTTP 1.1 without Host header or GET requests containing back slashes
Some of these spambots even reveal an odd sense of humour, like for instance only spamming posts related to, well, referrer spam :-). But at any case the test run will either include an url that looks unsuspicious or doesn't even contain a referrer.

The bot nature will become apparent if you look into Apache's log files as there won't be any css, js, or image files requested (embedded into the page delivered) - just the page/post itself because in most cases the bots are too dumb to understand HTML. Also, a lot of HTTP headers usually sent with a browser request are missing (in most cases you only have the Host, User-Agent and Referrer header defined). So checking for missing headers in requests made by typical browser user agents may be a criterium for filter rules (mod_rewrite would be the preferred choice here).</description>
		<content:encoded><![CDATA[<p>I&#8217;ve seen quite a lot  of these testing bids over the past years, the &#8220;Google check&#8221; only one of them. Other variants may include:<br />
an obviously faked or empty user agent string from the ip range of hosting companies (like ThePlanet and EV1)<br />
continiously changing ip addresses (typically open proxies) with the same user agent using hitting a particular page normally not requested very often<br />
visitors hitting the site with continously changing user agents (switching between bots and browsers)<br />
visitors with browser user agents making HEAD requests<br />
A couple of entries in Apache&#8217;s error log complaining about HTTP violations, like sending HTTP 1.1 without Host header or GET requests containing back slashes<br />
Some of these spambots even reveal an odd sense of humour, like for instance only spamming posts related to, well, referrer spam <img src='http://spamhuntress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . But at any case the test run will either include an url that looks unsuspicious or doesn&#8217;t even contain a referrer.</p>
<p>The bot nature will become apparent if you look into Apache&#8217;s log files as there won&#8217;t be any css, js, or image files requested (embedded into the page delivered) - just the page/post itself because in most cases the bots are too dumb to understand HTML. Also, a lot of HTTP headers usually sent with a browser request are missing (in most cases you only have the Host, User-Agent and Referrer header defined). So checking for missing headers in requests made by typical browser user agents may be a criterium for filter rules (mod_rewrite would be the preferred choice here).
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
