<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.7" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Another hungry java bot</title>
	<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Fri, 22 Aug 2008 04:21:35 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.7</generator>

	<item>
		<title>by: frank</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-47096</link>
		<pubDate>Mon, 02 Oct 2006 10:49:32 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-47096</guid>
					<description>Why not blocking all javabots. It works for me like this:

SetEnvIfNoCase User-Agent "java/" bad_bot
Order Allow,Deny
Allow from all
Deny from env=bad_bot

(save as .htaccess)

I've noticed on my site that there are some spammers using the same ip-address over and over again. But the smart ones are using variable ip-addresses, these are the worst ones. 
My question is how can you block these as they look like a normal browser?</description>
		<content:encoded><![CDATA[<p>Why not blocking all javabots. It works for me like this:</p>
<p>SetEnvIfNoCase User-Agent &#8220;java/&#8221; bad_bot<br />
Order Allow,Deny<br />
Allow from all<br />
Deny from env=bad_bot</p>
<p>(save as .htaccess)</p>
<p>I&#8217;ve noticed on my site that there are some spammers using the same ip-address over and over again. But the smart ones are using variable ip-addresses, these are the worst ones.<br />
My question is how can you block these as they look like a normal browser?
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Jo</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-32208</link>
		<pubDate>Wed, 16 Aug 2006 09:13:58 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-32208</guid>
					<description>is this for inside an .htaccess?</description>
		<content:encoded><![CDATA[<p>is this for inside an .htaccess?
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Martin Schuster</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-6656</link>
		<pubDate>Tue, 18 Apr 2006 12:36:50 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-6656</guid>
					<description>I guess you want to use
&lt;code&gt;SetEnvIfNoCase User-Agent ^Java/1\. botsucker=yes&lt;/code&gt;
:)</description>
		<content:encoded><![CDATA[<p>I guess you want to use<br />
<code>SetEnvIfNoCase User-Agent ^Java/1\. botsucker=yes</code><br />
:)
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Administrator</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3523</link>
		<pubDate>Sun, 19 Feb 2006 09:49:12 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3523</guid>
					<description>SetenvifNoCase User-Agent "Java/1." botsucker=yes
SetenvifNoCase User-Agent "Snoopy" botsucker=yes
deny from env=botsucker</description>
		<content:encoded><![CDATA[<p>SetenvifNoCase User-Agent &#8220;Java/1.&#8221; botsucker=yes<br />
SetenvifNoCase User-Agent &#8220;Snoopy&#8221; botsucker=yes<br />
deny from env=botsucker
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Ajay D'Souza</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3514</link>
		<pubDate>Sun, 19 Feb 2006 05:37:26 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3514</guid>
					<description>I've noticed a similar problem.

How do I block the useragent directly as opposed to just blocking the ips?</description>
		<content:encoded><![CDATA[<p>I&#8217;ve noticed a similar problem.</p>
<p>How do I block the useragent directly as opposed to just blocking the ips?
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Joe</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3402</link>
		<pubDate>Wed, 15 Feb 2006 01:47:53 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3402</guid>
					<description>Alden is talking about stuff that has been around before the new personalized search options.  Google has an &lt;a href="http://www.google.com/webmasters/remove.html" rel="nofollow"&gt;Automatic URL Removal System&lt;/a&gt;.  Submitting a URL won't remove a page on its own, Google will quickly check (rather than waiting till your next site crawl) to see that the URL has been added to the site's robots.txt and remove it.  Since it checks for the site's robots.txt, I doubt it could be used to maliciously remove a URL.</description>
		<content:encoded><![CDATA[<p>Alden is talking about stuff that has been around before the new personalized search options.  Google has an <a href="http://www.google.com/webmasters/remove.html" rel="nofollow">Automatic URL Removal System</a>.  Submitting a URL won&#8217;t remove a page on its own, Google will quickly check (rather than waiting till your next site crawl) to see that the URL has been added to the site&#8217;s robots.txt and remove it.  Since it checks for the site&#8217;s robots.txt, I doubt it could be used to maliciously remove a URL.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: gpshewan</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3385</link>
		<pubDate>Tue, 14 Feb 2006 11:02:53 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3385</guid>
					<description>I took the decision a month or so ago to block all Java user agents.  Can't see the need for them, don't see why any legit agent should be Java.  I'd agree with Dirk on that ... if somebody is using a feedreader that doesn't say what it is in the UA and uses a stock 'Java/x.x.x.' field then tough.

Alden, the remove URL function is only a UI experiment (still?) and there's no way for anybody to remove a sites URL from Google ... just from their own search results.  Google are the only ones who do the removing from the index :)</description>
		<content:encoded><![CDATA[<p>I took the decision a month or so ago to block all Java user agents.  Can&#8217;t see the need for them, don&#8217;t see why any legit agent should be Java.  I&#8217;d agree with Dirk on that &#8230; if somebody is using a feedreader that doesn&#8217;t say what it is in the UA and uses a stock &#8216;Java/x.x.x.&#8217; field then tough.</p>
<p>Alden, the remove URL function is only a UI experiment (still?) and there&#8217;s no way for anybody to remove a sites URL from Google &#8230; just from their own search results.  Google are the only ones who do the removing from the index <img src='http://spamhuntress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Alden Bates</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3379</link>
		<pubDate>Tue, 14 Feb 2006 05:33:10 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3379</guid>
					<description>If you do, avoid blocking 217.78.47.35, 216.239.3*.* and 66.102.6.136

When you use Google's Remove URL function, they have a bot which polls the entered URL every so often using a Java user agent.  If a site blocks all Java user agents, then it's possible for a malicious party to get them removed from Google.</description>
		<content:encoded><![CDATA[<p>If you do, avoid blocking 217.78.47.35, 216.239.3*.* and 66.102.6.136</p>
<p>When you use Google&#8217;s Remove URL function, they have a bot which polls the entered URL every so often using a Java user agent.  If a site blocks all Java user agents, then it&#8217;s possible for a malicious party to get them removed from Google.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Dirk</title>
		<link>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3365</link>
		<pubDate>Mon, 13 Feb 2006 21:06:09 +0000</pubDate>
		<guid>http://spamhuntress.com/2006/02/13/another-hungry-java-bot/#comment-3365</guid>
					<description>Block it. The only legit requests from Java user agents we're seeing are for our RSS feeds. In which case I'd say - though luck, use some other RSS reader.</description>
		<content:encoded><![CDATA[<p>Block it. The only legit requests from Java user agents we&#8217;re seeing are for our RSS feeds. In which case I&#8217;d say - though luck, use some other RSS reader.
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
