Omni Explorer gobbles 300 megabytes

Analysis of Omni-Explorer

I had the Omni Explorer on one of my sites a few days ago, and thought it was aggressive.

Unfortunately I didn’t do anything about it.

Yesterday it hit NativeCelebs. That’s a HUGE site, and the bot proceeded to gobble up (According to Awstats) 304.24 MB.

I checked the raw log, and that IP number didn’t hit my site until
[22/May/2005:09:17:08 -0400]
and stopped
[22/May/2005:10:37:26 -0400]

IP number:
64.71.131.121 (nativecelebs)
64.71.131.120 (spamhuntress)
User agent:
OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Internet Categorizer

If you’ve got a large site, block it fast!

I had similar hits before (just a few)
64.62.175.131
OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com) Cars Crawler

I’ve had one access from that IP block before:
64.71.131.107
A normal browser UA, but didn’t load any extra files. Had a referrer from a site that links to me and went after the spampop page. Must have been a bot.

On NativeCelebs I’ve had a number of accesses from both Omni Explorer UA and normal browser UA from that IP block. I’ll find them and collate them here. All of the bots have full normal referrers. Wherever they came in from, that’s the referrer they leave. The same IP number can have the Omni UA one day and a normal browser UA another day. And apart from the gorge fest yesterday, I find the accesses one at a time, or a few at a time, starting May 16, 2005

64.71.131.107
64.71.131.108
64.71.131.109
64.71.131.110
64.71.131.111
64.71.131.112
64.71.131.114
64.71.131.115
64.71.131.120
64.71.131.121

In April I also had visits from this family of bots. And back then they came from a different IP block:
64.62.175.133-64.62.175.137

Earlier post about this bot

20 Responses to “Omni Explorer gobbles 300 megabytes”

  1. Rev says:

    Haha, this Omni-expl bot crossed my blog a month ago, and raped my stats. I blocked it right away ;x Some other bot (MSN-bot, according to the stats, dunno wether that’s true, never noticed the msn bot before so..) was indexing tons of non-existing files. Didn’t really mess up the bandwidth, but the number of request errors where enormous! =)

  2. Darin says:

    The Omni crawled my site as well under the following name:

    OmniExplorer_Bot/1.07

    Added it to my robots.txt file

    I’m pretty ignorant about these things, though. Does this mean if a “User-agent: OmniExplorer_Bot/1.08″ comes along, my block is useless? Thanks to anybody who knows!

    Moderator: Removed spammy URL from URL field.

  3. Alan Moss says:

    I also just got hit by the OmniExplorer_Bot below is the info I have:

    OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Internet Categorizer

    And the IP it used was: 65.19.150.248

    I’m fairly sure that the bot did NOT request my robots.txt file.

  4. Craig says:

    I just got hit from the OmniExplorer_Bot this morning.

    The IP it used was 65.19.150.248

    It did not check for my robots.txt file. Not that it would have mattered though since its not blocked.

    I do not have a large site. Do you think this bot will cause any problems?

  5. Administrator says:

    If your server can handle it, and you have enough bandwidth, it shouldn’t be a problem.

    But the bot is causing problems in general. It just keeps switching IP addresses to avoid the bans.

    I’m thinking it’s beginning to deserve a place on a blocklist, just for being what it is. Meaning any new IP range it uses gets listed after a while. Hopefully it’ll gain enough of a bad rep it’ll run out of hosts eventually. Or end up on hosts so slow it will be less of a problem…

  6. Hankwang says:

    On June 8, I got 4700 hits (200 MB) from 65.19.150.252 on my spam trap. At the very beginning of my homepage there is a hidden link that points to a CGI that waits a few seconds and then produces garbage and nonexistant email addresses.

    I can recommend this if you have a website with an (almost) infinite space of dynamic pages. This way, spambots won’t put a load on the dynamic pages that take more CPU time. If it was harvesting email addresses, then it now has 20k nonexistant addresses, which also is a pleasant feeling.

  7. Vegard says:

    I got about 4500 hits from OmniExplorer before I googled and found this informative page. I have blocked the IPs, but also the UserAgent:

    SetEnvIfNoCase User-Agent “^OmniExplorer” bad_bot

    order allow,deny
    allow from all
    deny from 64.71.131.107
    # and the other IP addresses
    deny from env=bad_bot

  8. brian says:

    I’m also getting hit by this bot. I have a form with javascript validation on it, and it appears to be doing a post on it which is irritating my client. I’m gonna try the above user-agent block. - Will that catch all instances of it?

  9. Administrator says:

    No, it won’t. The bot sometimes uses a regular Firefox user agent. And they switch IP blocks very often.

    You may want to think about blocking Hurrican Electric’s IP blocks? May be some collateral damager. They ARE known as a spam host in general, though.

  10. I’ve compiled the netblocks for Hurricane Electric, along with the CIDR notation and netmasks, in case you want to block ALL of their network. The top part is their network info, and the bottom is the network IP/Subnet Mask for your firewall or IP denied list.

    Hurricane Electric IP blocks from ARIN:

    Hurricane Electric HURRICANE-DC0013-2 (NET-216-218-130-136-1) 216.218.130.136 - 216.218.130.143 /29 255.255.255.248
    Hurricane Electric HURRICANE-1 (NET-216-218-128-0-1) 216.218.128.0 - 216.218.255.255 /17 255.255.128
    Hurricane Electric HURRICANE-2 (NET-64-71-128-0-1) 64.71.128.0 - 64.71.191.255 /18 255.255.192.0
    Hurricane Electric HURRICANE-4 (NET-65-19-128-0-1) 65.19.128.0 - 65.19.191.255 /18 255.255.192.0
    Hurricane Electric HURRICANE-3 (NET-66-220-0-0-1) 66.220.0.0 - 66.220.31.255 /19 255.255.224.0
    Hurricane Electric HURRICANE-5 (NET-209-51-160-0-1) 209.51.160.0 - 209.51.191.255 /19 255.255.224.0
    Hurricane Electric HURRICANE-6 (NET-216-66-0-0-1) 216.66.0.0 - 216.66.95.255 /19 255.255.224.0
    Hurricane Electric HURRICANE-4 (NET-64-62-128-0-1) 64.62.128.0 - 64.62.255.255 /17 255.255.128
    Hurricane Electric HURRICANE-7 (NET-66-160-128-0-1) 66.160.128.0 - 66.160.207.255 /17 255.255.128
    Hurricane Electric HURRICANE-DC0012-2769 (NET-216-218-130-128-1) 216.218.130.128 - 216.218.130.135 /29 255.255.255.248
    Hurricane Electric HURRICANE-DC0012-151 (NET-64-71-191-56-1) 64.71.191.56 - 64.71.191.63 /29 255.255.255.248
    Hurricane Electric HURRICANE-DC0012-262 (NET-66-220-4-240-1) 66.220.4.240 - 66.220.4.255 /28 255.255.255.240
    Hurricane Electric HURRICANE-DC0043-131 (NET-216-218-229-80-1) 216.218.229.80 - 216.218.229.95 /28 255.255.255.240
    Hurricane Electric HURRICANE-DC0043-151 (NET-216-218-158-16-1) 216.218.158.16 - 216.218.158.31 /28 255.255.255.240

    Denied IP and Subnet Mask Listing
    216.218.130.136 255.255.255.248
    216.218.128.0 255.255.128
    64.71.128.0 255.255.192.0
    65.19.128.0 255.255.192.0
    66.220.0.0 255.255.224.0
    209.51.160.0 255.255.224.0
    216.66.0.0 255.255.224.0
    64.62.128.0 255.255.128
    66.160.128.0 255.255.128
    216.218.130.128 255.255.255.248
    64.71.191.56 255.255.255.248
    66.220.4.240 255.255.255.240
    216.218.229.80 255.255.255.240
    216.218.158.16 255.255.255.240

  11. This list may be easier to read than the last:

    Denied IP Listing and Netmask for Hurricane Electric
    216.218.130.136 255.255.255.248
    216.218.128.0 255.255.128
    64.71.128.0 255.255.192.0
    65.19.128.0 255.255.192.0
    66.220.0.0 255.255.224.0
    209.51.160.0 255.255.224.0
    216.66.0.0 255.255.224.0
    64.62.128.0 255.255.128
    66.160.128.0 255.255.128
    216.218.130.128 255.255.255.248
    64.71.191.56 255.255.255.248
    66.220.4.240 255.255.255.240
    216.218.229.80 255.255.255.240
    216.218.158.16 255.255.255.240

  12. To kinda build on-top of what some other posters mentioned, what would guys think of this automated process:

    1) set-up honeypot web pages on your site. Those are pages no valid end-user or valid bot should ever hit.

    2) place prominent links to those honeypot pages everywhere on your *legit* pages. But hide those links using CSS after giving them a class attribute. you use display:none; and/or visibility:hidden. You could even use javascript to nullify the urls, so a honeypot link might look like: click me with a css block somewhere that declares: .honeypotStyle {display:none;visibility:hidden;}. I’m not sure whether that’ll handle mobile devices too well. You might also check for http headers being sent. A normal browser will send a slew of http headers such as the “Accept:” one. If a bot author is lazy/ignorant, he won’t send those.

    3) place the honeypot URLs in a robots.txt file at the root of your site, be sure to deny all. From here, all legit bots will ignore your honeypots urls.

    4) set-up an automated process by which an IP address gets blacklisted/thoroughly logged with ALL http headers that were sent in the request/reported/let-your-imagination-run-wild as soon as it hits your honeypot url.

    Would this work?

    -chris

  13. ugh sorry i guess the html i had in my post did get literally rendered, what i meant to say about javascript is add the follwing attribute to your “a” link tags: onmouseover=”this.href=’about:blank’;”

  14. Administrator says:

    They’ve just put up a notice on their site. Apologizing for how hungry the bot was, and saying it will obey robots.txt.

    There’s no word about why it sometimes comes with a user agent that looks like a regular browser.

    Frankly, I believe they need to include more than what they have. They need to include the IP range, so we can see who’s them and who’s a copycat.

  15. Anonymous says:

    Hi
    I am newbie to blocking a bot. This bot has been gulping more bandwidth on my site. Can anybody please give me instruction on how to block this bot with blocking code for robot.txt
    I have two omni bot on my website
    OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Interne
    OmniExplorer_Bot/1.10 (+http://www.omni-explorer.com) Jobs Cr
    Thanks in advance.

  16. Administrator says:

    They say it’s going to obey robots.txt from now on. So read up on that.

  17. Hankwang says:

    (Moderator, please delete previous comment, I messed up the html tags).

    To Chris about the honeypots: that’s basically what I did except for the automated IP blocking. The hidden link to the honeypot is the first link on each of my pages. I do it like this:

    <small><small><a class=”hidden” rel=”nofollow” href=”blah/”>email</a></small></small>

    to discourage the search engines from ranking my pages based on what’s in the link text (the word “email”). The nofollow will prevent Google from indexing the linked urls (even if it is not crawled due to robots.txt, Google will index the URL if there are links pointing to it).

  18. James Robinson says:

    Just got through blocking them via firewall rules since they wandered into robots.txt restricted space. It was nice of ‘em to post their IP ranges on their site, but now after googling about them and learning just how evil they really are, I wouldn’t be surprised if they happened to omit a few blocks on that list.

    I wrote their feedback email address telling ‘em that their robot didn’t honor robots.txt properly [ didn't honor "User-agent: *" block at all ], and I got a pair of emails from an ‘Anton Stanley’ saying that he apologized, that their robot was having troubles lately, and that they were planning on launching several ‘vertical search engines’ by the end of the year, and they could drive quality traffic to our site. And he asked if we’d just go back to blocking ‘em via a user-agent-specific robots.txt block. The answer was no. I hope their robot stumbles gets gummed up bad when their packets are routed to /dev/null .

  19. Irate Webmaster says:

    I just had a visit on 1/22/06 and 1/23/06 from 65.19.150.248

    It gobbled 3GB in bandwidth over 61,000+ page hits in those 2 days.

    My robots.txt file wasn’t followed.

    Just thought I’d share.

  20. gotit all undercontol says:

    We have been blacklisting and null routing all traffic that originates from hurricane electric for over two years and have not had one single complaint from a valid sender. Therefore we concluded there are no non-spammers on he.net networks…

Leave a Reply