Omni Explorer gobbles 300 megabytes
I had the Omni Explorer on one of my sites a few days ago, and thought it was aggressive.
Unfortunately I didn’t do anything about it.
Yesterday it hit NativeCelebs. That’s a HUGE site, and the bot proceeded to gobble up (According to Awstats) 304.24 MB.
I checked the raw log, and that IP number didn’t hit my site until
[22/May/2005:09:17:08 -0400]
and stopped
[22/May/2005:10:37:26 -0400]
IP number:
64.71.131.121 (nativecelebs)
64.71.131.120 (spamhuntress)
User agent:
OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Internet Categorizer
If you’ve got a large site, block it fast!
I had similar hits before (just a few)
64.62.175.131
OmniExplorer_Bot/1.09 (+http://www.omni-explorer.com) Cars Crawler
I’ve had one access from that IP block before:
64.71.131.107
A normal browser UA, but didn’t load any extra files. Had a referrer from a site that links to me and went after the spampop page. Must have been a bot.
On NativeCelebs I’ve had a number of accesses from both Omni Explorer UA and normal browser UA from that IP block. I’ll find them and collate them here. All of the bots have full normal referrers. Wherever they came in from, that’s the referrer they leave. The same IP number can have the Omni UA one day and a normal browser UA another day. And apart from the gorge fest yesterday, I find the accesses one at a time, or a few at a time, starting May 16, 2005
64.71.131.107
64.71.131.108
64.71.131.109
64.71.131.110
64.71.131.111
64.71.131.112
64.71.131.114
64.71.131.115
64.71.131.120
64.71.131.121
In April I also had visits from this family of bots. And back then they came from a different IP block:
64.62.175.133-64.62.175.137
May 26th, 2005 at 8:06 am
Haha, this Omni-expl bot crossed my blog a month ago, and raped my stats. I blocked it right away ;x Some other bot (MSN-bot, according to the stats, dunno wether that’s true, never noticed the msn bot before so..) was indexing tons of non-existing files. Didn’t really mess up the bandwidth, but the number of request errors where enormous! =)
June 3rd, 2005 at 12:55 am
The Omni crawled my site as well under the following name:
OmniExplorer_Bot/1.07
Added it to my robots.txt file
I’m pretty ignorant about these things, though. Does this mean if a “User-agent: OmniExplorer_Bot/1.08″ comes along, my block is useless? Thanks to anybody who knows!
Moderator: Removed spammy URL from URL field.
June 10th, 2005 at 6:28 am
I also just got hit by the OmniExplorer_Bot below is the info I have:
OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Internet Categorizer
And the IP it used was: 65.19.150.248
I’m fairly sure that the bot did NOT request my robots.txt file.
June 10th, 2005 at 12:10 pm
I just got hit from the OmniExplorer_Bot this morning.
The IP it used was 65.19.150.248
It did not check for my robots.txt file. Not that it would have mattered though since its not blocked.
I do not have a large site. Do you think this bot will cause any problems?
June 10th, 2005 at 12:34 pm
If your server can handle it, and you have enough bandwidth, it shouldn’t be a problem.
But the bot is causing problems in general. It just keeps switching IP addresses to avoid the bans.
I’m thinking it’s beginning to deserve a place on a blocklist, just for being what it is. Meaning any new IP range it uses gets listed after a while. Hopefully it’ll gain enough of a bad rep it’ll run out of hosts eventually. Or end up on hosts so slow it will be less of a problem…
June 12th, 2005 at 4:52 pm
On June 8, I got 4700 hits (200 MB) from 65.19.150.252 on my spam trap. At the very beginning of my homepage there is a hidden link that points to a CGI that waits a few seconds and then produces garbage and nonexistant email addresses.
I can recommend this if you have a website with an (almost) infinite space of dynamic pages. This way, spambots won’t put a load on the dynamic pages that take more CPU time. If it was harvesting email addresses, then it now has 20k nonexistant addresses, which also is a pleasant feeling.
June 13th, 2005 at 1:01 pm
I got about 4500 hits from OmniExplorer before I googled and found this informative page. I have blocked the IPs, but also the UserAgent:
SetEnvIfNoCase User-Agent “^OmniExplorer” bad_bot
order allow,deny
allow from all
deny from 64.71.131.107
# and the other IP addresses
deny from env=bad_bot
June 14th, 2005 at 11:09 am
I’m also getting hit by this bot. I have a form with javascript validation on it, and it appears to be doing a post on it which is irritating my client. I’m gonna try the above user-agent block. - Will that catch all instances of it?
June 14th, 2005 at 11:16 am
No, it won’t. The bot sometimes uses a regular Firefox user agent. And they switch IP blocks very often.
You may want to think about blocking Hurrican Electric’s IP blocks? May be some collateral damager. They ARE known as a spam host in general, though.
June 15th, 2005 at 2:58 am
I’ve compiled the netblocks for Hurricane Electric, along with the CIDR notation and netmasks, in case you want to block ALL of their network. The top part is their network info, and the bottom is the network IP/Subnet Mask for your firewall or IP denied list.
Hurricane Electric IP blocks from ARIN:
Hurricane Electric HURRICANE-DC0013-2 (NET-216-218-130-136-1) 216.218.130.136 - 216.218.130.143 /29 255.255.255.248
Hurricane Electric HURRICANE-1 (NET-216-218-128-0-1) 216.218.128.0 - 216.218.255.255 /17 255.255.128
Hurricane Electric HURRICANE-2 (NET-64-71-128-0-1) 64.71.128.0 - 64.71.191.255 /18 255.255.192.0
Hurricane Electric HURRICANE-4 (NET-65-19-128-0-1) 65.19.128.0 - 65.19.191.255 /18 255.255.192.0
Hurricane Electric HURRICANE-3 (NET-66-220-0-0-1) 66.220.0.0 - 66.220.31.255 /19 255.255.224.0
Hurricane Electric HURRICANE-5 (NET-209-51-160-0-1) 209.51.160.0 - 209.51.191.255 /19 255.255.224.0
Hurricane Electric HURRICANE-6 (NET-216-66-0-0-1) 216.66.0.0 - 216.66.95.255 /19 255.255.224.0
Hurricane Electric HURRICANE-4 (NET-64-62-128-0-1) 64.62.128.0 - 64.62.255.255 /17 255.255.128
Hurricane Electric HURRICANE-7 (NET-66-160-128-0-1) 66.160.128.0 - 66.160.207.255 /17 255.255.128
Hurricane Electric HURRICANE-DC0012-2769 (NET-216-218-130-128-1) 216.218.130.128 - 216.218.130.135 /29 255.255.255.248
Hurricane Electric HURRICANE-DC0012-151 (NET-64-71-191-56-1) 64.71.191.56 - 64.71.191.63 /29 255.255.255.248
Hurricane Electric HURRICANE-DC0012-262 (NET-66-220-4-240-1) 66.220.4.240 - 66.220.4.255 /28 255.255.255.240
Hurricane Electric HURRICANE-DC0043-131 (NET-216-218-229-80-1) 216.218.229.80 - 216.218.229.95 /28 255.255.255.240
Hurricane Electric HURRICANE-DC0043-151 (NET-216-218-158-16-1) 216.218.158.16 - 216.218.158.31 /28 255.255.255.240
Denied IP and Subnet Mask Listing
216.218.130.136 255.255.255.248
216.218.128.0 255.255.128
64.71.128.0 255.255.192.0
65.19.128.0 255.255.192.0
66.220.0.0 255.255.224.0
209.51.160.0 255.255.224.0
216.66.0.0 255.255.224.0
64.62.128.0 255.255.128
66.160.128.0 255.255.128
216.218.130.128 255.255.255.248
64.71.191.56 255.255.255.248
66.220.4.240 255.255.255.240
216.218.229.80 255.255.255.240
216.218.158.16 255.255.255.240
June 15th, 2005 at 3:09 am
This list may be easier to read than the last:
Denied IP Listing and Netmask for Hurricane Electric
216.218.130.136 255.255.255.248
216.218.128.0 255.255.128
64.71.128.0 255.255.192.0
65.19.128.0 255.255.192.0
66.220.0.0 255.255.224.0
209.51.160.0 255.255.224.0
216.66.0.0 255.255.224.0
64.62.128.0 255.255.128
66.160.128.0 255.255.128
216.218.130.128 255.255.255.248
64.71.191.56 255.255.255.248
66.220.4.240 255.255.255.240
216.218.229.80 255.255.255.240
216.218.158.16 255.255.255.240
June 18th, 2005 at 4:54 pm
To kinda build on-top of what some other posters mentioned, what would guys think of this automated process:
1) set-up honeypot web pages on your site. Those are pages no valid end-user or valid bot should ever hit.
2) place prominent links to those honeypot pages everywhere on your *legit* pages. But hide those links using CSS after giving them a class attribute. you use display:none; and/or visibility:hidden. You could even use javascript to nullify the urls, so a honeypot link might look like: click me with a css block somewhere that declares: .honeypotStyle {display:none;visibility:hidden;}. I’m not sure whether that’ll handle mobile devices too well. You might also check for http headers being sent. A normal browser will send a slew of http headers such as the “Accept:” one. If a bot author is lazy/ignorant, he won’t send those.
3) place the honeypot URLs in a robots.txt file at the root of your site, be sure to deny all. From here, all legit bots will ignore your honeypots urls.
4) set-up an automated process by which an IP address gets blacklisted/thoroughly logged with ALL http headers that were sent in the request/reported/let-your-imagination-run-wild as soon as it hits your honeypot url.
Would this work?
-chris
June 18th, 2005 at 4:56 pm
ugh sorry i guess the html i had in my post did get literally rendered, what i meant to say about javascript is add the follwing attribute to your “a” link tags: onmouseover=”this.href=’about:blank’;”
June 19th, 2005 at 7:17 am
They’ve just put up a notice on their site. Apologizing for how hungry the bot was, and saying it will obey robots.txt.
There’s no word about why it sometimes comes with a user agent that looks like a regular browser.
Frankly, I believe they need to include more than what they have. They need to include the IP range, so we can see who’s them and who’s a copycat.
June 20th, 2005 at 5:18 pm
Hi
I am newbie to blocking a bot. This bot has been gulping more bandwidth on my site. Can anybody please give me instruction on how to block this bot with blocking code for robot.txt
I have two omni bot on my website
OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Interne
OmniExplorer_Bot/1.10 (+http://www.omni-explorer.com) Jobs Cr
Thanks in advance.
June 20th, 2005 at 5:24 pm
They say it’s going to obey robots.txt from now on. So read up on that.
June 21st, 2005 at 3:47 pm
(Moderator, please delete previous comment, I messed up the html tags).
To Chris about the honeypots: that’s basically what I did except for the automated IP blocking. The hidden link to the honeypot is the first link on each of my pages. I do it like this:
<small><small><a class=”hidden” rel=”nofollow” href=”blah/”>email</a></small></small>
to discourage the search engines from ranking my pages based on what’s in the link text (the word “email”). The nofollow will prevent Google from indexing the linked urls (even if it is not crawled due to robots.txt, Google will index the URL if there are links pointing to it).
December 3rd, 2005 at 1:08 pm
Just got through blocking them via firewall rules since they wandered into robots.txt restricted space. It was nice of ‘em to post their IP ranges on their site, but now after googling about them and learning just how evil they really are, I wouldn’t be surprised if they happened to omit a few blocks on that list.
I wrote their feedback email address telling ‘em that their robot didn’t honor robots.txt properly [ didn’t honor “User-agent: *” block at all ], and I got a pair of emails from an ‘Anton Stanley’ saying that he apologized, that their robot was having troubles lately, and that they were planning on launching several ‘vertical search engines’ by the end of the year, and they could drive quality traffic to our site. And he asked if we’d just go back to blocking ‘em via a user-agent-specific robots.txt block. The answer was no. I hope their robot stumbles gets gummed up bad when their packets are routed to /dev/null .
January 28th, 2006 at 7:52 pm
I just had a visit on 1/22/06 and 1/23/06 from 65.19.150.248
It gobbled 3GB in bandwidth over 61,000+ page hits in those 2 days.
My robots.txt file wasn’t followed.
Just thought I’d share.
November 26th, 2006 at 4:11 pm
We have been blacklisting and null routing all traffic that originates from hurricane electric for over two years and have not had one single complaint from a valid sender. Therefore we concluded there are no non-spammers on he.net networks…