Another hungry java bot
I had another spike in my bandwidth meter today.
Perpetrator:
210.177.215.29
from Hong Kong
User agent:
Java/1.4.1_04
This one wasn’t too bad in terms of how much I downloaded. I think. I haven’t checked it for sure.
But 152 requests from 10:01:21 to 10:04:34 is VERY inconsiderate at best.
I’m tired of this. I’ll block anything with Java in the user agent, unless you guys can find some reason not to?
Here’s another hungry bot:
Hungry Java bot
Oh, and I ran a grep on my logs, a few days in February netted these with a Java/1.4 something bot:
62.163.12.31 (came back another time)
63.230.22.115
82.170.231.97
84.36.69.19
84.176.66.18
84.176.74.179
84.178.149.81
163.17.205.1
207.91.139.189
February 13th, 2006 at 3:06 pm
Block it. The only legit requests from Java user agents we’re seeing are for our RSS feeds. In which case I’d say - though luck, use some other RSS reader.
February 13th, 2006 at 11:33 pm
If you do, avoid blocking 217.78.47.35, 216.239.3*.* and 66.102.6.136
When you use Google’s Remove URL function, they have a bot which polls the entered URL every so often using a Java user agent. If a site blocks all Java user agents, then it’s possible for a malicious party to get them removed from Google.
February 14th, 2006 at 5:02 am
I took the decision a month or so ago to block all Java user agents. Can’t see the need for them, don’t see why any legit agent should be Java. I’d agree with Dirk on that … if somebody is using a feedreader that doesn’t say what it is in the UA and uses a stock ‘Java/x.x.x.’ field then tough.
Alden, the remove URL function is only a UI experiment (still?) and there’s no way for anybody to remove a sites URL from Google … just from their own search results. Google are the only ones who do the removing from the index
February 14th, 2006 at 7:47 pm
Alden is talking about stuff that has been around before the new personalized search options. Google has an Automatic URL Removal System. Submitting a URL won’t remove a page on its own, Google will quickly check (rather than waiting till your next site crawl) to see that the URL has been added to the site’s robots.txt and remove it. Since it checks for the site’s robots.txt, I doubt it could be used to maliciously remove a URL.
February 18th, 2006 at 11:37 pm
I’ve noticed a similar problem.
How do I block the useragent directly as opposed to just blocking the ips?
February 19th, 2006 at 3:49 am
SetenvifNoCase User-Agent “Java/1.” botsucker=yes
SetenvifNoCase User-Agent “Snoopy” botsucker=yes
deny from env=botsucker
April 18th, 2006 at 6:36 am
I guess you want to use
SetEnvIfNoCase User-Agent ^Java/1\. botsucker=yes:)
August 16th, 2006 at 3:13 am
is this for inside an .htaccess?
October 2nd, 2006 at 4:49 am
Why not blocking all javabots. It works for me like this:
SetEnvIfNoCase User-Agent “java/” bad_bot
Order Allow,Deny
Allow from all
Deny from env=bad_bot
(save as .htaccess)
I’ve noticed on my site that there are some spammers using the same ip-address over and over again. But the smart ones are using variable ip-addresses, these are the worst ones.
My question is how can you block these as they look like a normal browser?