Another hungry java bot

I had another spike in my bandwidth meter today.

Perpetrator:
210.177.215.29
from Hong Kong

User agent:
Java/1.4.1_04

This one wasn’t too bad in terms of how much I downloaded. I think. I haven’t checked it for sure.

But 152 requests from 10:01:21 to 10:04:34 is VERY inconsiderate at best.

I’m tired of this. I’ll block anything with Java in the user agent, unless you guys can find some reason not to?

Here’s another hungry bot:
Hungry Java bot

Oh, and I ran a grep on my logs, a few days in February netted these with a Java/1.4 something bot:

62.163.12.31 (came back another time)
63.230.22.115
82.170.231.97
84.36.69.19
84.176.66.18
84.176.74.179
84.178.149.81
163.17.205.1
207.91.139.189

9 Responses to “Another hungry java bot”

  1. Dirk Says:

    Block it. The only legit requests from Java user agents we’re seeing are for our RSS feeds. In which case I’d say - though luck, use some other RSS reader.

  2. Alden Bates Says:

    If you do, avoid blocking 217.78.47.35, 216.239.3*.* and 66.102.6.136

    When you use Google’s Remove URL function, they have a bot which polls the entered URL every so often using a Java user agent. If a site blocks all Java user agents, then it’s possible for a malicious party to get them removed from Google.

  3. gpshewan Says:

    I took the decision a month or so ago to block all Java user agents. Can’t see the need for them, don’t see why any legit agent should be Java. I’d agree with Dirk on that … if somebody is using a feedreader that doesn’t say what it is in the UA and uses a stock ‘Java/x.x.x.’ field then tough.

    Alden, the remove URL function is only a UI experiment (still?) and there’s no way for anybody to remove a sites URL from Google … just from their own search results. Google are the only ones who do the removing from the index :)

  4. Joe Says:

    Alden is talking about stuff that has been around before the new personalized search options. Google has an Automatic URL Removal System. Submitting a URL won’t remove a page on its own, Google will quickly check (rather than waiting till your next site crawl) to see that the URL has been added to the site’s robots.txt and remove it. Since it checks for the site’s robots.txt, I doubt it could be used to maliciously remove a URL.

  5. Ajay D'Souza Says:

    I’ve noticed a similar problem.

    How do I block the useragent directly as opposed to just blocking the ips?

  6. Administrator Says:

    SetenvifNoCase User-Agent “Java/1.” botsucker=yes
    SetenvifNoCase User-Agent “Snoopy” botsucker=yes
    deny from env=botsucker

  7. Martin Schuster Says:

    I guess you want to use
    SetEnvIfNoCase User-Agent ^Java/1\. botsucker=yes
    :)

  8. Jo Says:

    is this for inside an .htaccess?

  9. frank Says:

    Why not blocking all javabots. It works for me like this:

    SetEnvIfNoCase User-Agent “java/” bad_bot
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot

    (save as .htaccess)

    I’ve noticed on my site that there are some spammers using the same ip-address over and over again. But the smart ones are using variable ip-addresses, these are the worst ones.
    My question is how can you block these as they look like a normal browser?

Leave a Reply