<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.0.7" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Omni Explorer gobbles 300 megabytes</title>
	<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/</link>
	<description>Just another WordPress weblog</description>
	<pubDate>Thu, 20 Nov 2008 22:13:28 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.7</generator>

	<item>
		<title>by: gotit all undercontol</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-67749</link>
		<pubDate>Sun, 26 Nov 2006 22:11:32 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-67749</guid>
					<description>We have been blacklisting and null routing all traffic that originates from hurricane electric for over two years and have not had one single complaint from a valid sender. Therefore we concluded there are no non-spammers on he.net networks...</description>
		<content:encoded><![CDATA[<p>We have been blacklisting and null routing all traffic that originates from hurricane electric for over two years and have not had one single complaint from a valid sender. Therefore we concluded there are no non-spammers on he.net networks&#8230;
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Irate Webmaster</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-2879</link>
		<pubDate>Sun, 29 Jan 2006 01:52:46 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-2879</guid>
					<description>I just had a visit on 1/22/06 and 1/23/06 from 65.19.150.248

It gobbled 3GB in bandwidth over 61,000+ page hits in those 2 days.

My robots.txt file wasn't followed.

Just thought I'd share.</description>
		<content:encoded><![CDATA[<p>I just had a visit on 1/22/06 and 1/23/06 from 65.19.150.248</p>
<p>It gobbled 3GB in bandwidth over 61,000+ page hits in those 2 days.</p>
<p>My robots.txt file wasn&#8217;t followed.</p>
<p>Just thought I&#8217;d share.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: James Robinson</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-1963</link>
		<pubDate>Sat, 03 Dec 2005 19:08:55 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-1963</guid>
					<description>Just got through blocking them via firewall rules since they wandered into robots.txt restricted space. It was nice of 'em to post their IP ranges on their site, but now after googling about them and learning just how evil they really are, I wouldn't be surprised if they happened to omit a few blocks on that list.

I wrote their feedback email address telling 'em that their robot didn't honor robots.txt properly [ didn't honor "User-agent: *" block at all ], and I got a pair of emails from an 'Anton Stanley' saying that he apologized, that their robot was having troubles lately, and that they were planning on launching several 'vertical search engines' by the end of the year, and they could drive quality traffic to our site. And he asked if we'd just go back to blocking 'em via a user-agent-specific robots.txt block. The answer was no. I hope their robot stumbles gets gummed up bad when their packets are routed to /dev/null .</description>
		<content:encoded><![CDATA[<p>Just got through blocking them via firewall rules since they wandered into robots.txt restricted space. It was nice of &#8216;em to post their IP ranges on their site, but now after googling about them and learning just how evil they really are, I wouldn&#8217;t be surprised if they happened to omit a few blocks on that list.</p>
<p>I wrote their feedback email address telling &#8216;em that their robot didn&#8217;t honor robots.txt properly [ didn&#8217;t honor &#8220;User-agent: *&#8221; block at all ], and I got a pair of emails from an &#8216;Anton Stanley&#8217; saying that he apologized, that their robot was having troubles lately, and that they were planning on launching several &#8216;vertical search engines&#8217; by the end of the year, and they could drive quality traffic to our site. And he asked if we&#8217;d just go back to blocking &#8216;em via a user-agent-specific robots.txt block. The answer was no. I hope their robot stumbles gets gummed up bad when their packets are routed to /dev/null .
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Hankwang</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-905</link>
		<pubDate>Tue, 21 Jun 2005 21:47:46 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-905</guid>
					<description>(Moderator, please delete previous comment, I messed up the html tags).

To Chris about the honeypots: that's basically what I did except for the automated IP blocking. The hidden link to the honeypot is the first link on each of my pages. I do it like this:

&#60;small&#62;&#60;small&#62;&#60;a class="hidden" rel="nofollow" href="blah/"&#62;email&#60;/a&#62;&#60;/small&#62;&#60;/small&#62; 

to discourage the search engines from ranking my pages based on what's in the link text (the word "email"). The nofollow will prevent Google from indexing the linked urls (even if it is not crawled due to robots.txt, Google will index the URL if there are links pointing to it).</description>
		<content:encoded><![CDATA[<p>(Moderator, please delete previous comment, I messed up the html tags).</p>
<p>To Chris about the honeypots: that&#8217;s basically what I did except for the automated IP blocking. The hidden link to the honeypot is the first link on each of my pages. I do it like this:</p>
<p>&lt;small&gt;&lt;small&gt;&lt;a class=&#8221;hidden&#8221; rel=&#8221;nofollow&#8221; href=&#8221;blah/&#8221;&gt;email&lt;/a&gt;&lt;/small&gt;&lt;/small&gt; </p>
<p>to discourage the search engines from ranking my pages based on what&#8217;s in the link text (the word &#8220;email&#8221;). The nofollow will prevent Google from indexing the linked urls (even if it is not crawled due to robots.txt, Google will index the URL if there are links pointing to it).
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Administrator</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-900</link>
		<pubDate>Mon, 20 Jun 2005 23:24:13 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-900</guid>
					<description>They say it's going to obey robots.txt from now on. So read up on that.</description>
		<content:encoded><![CDATA[<p>They say it&#8217;s going to obey robots.txt from now on. So read up on that.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Anonymous</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-899</link>
		<pubDate>Mon, 20 Jun 2005 23:18:25 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-899</guid>
					<description>Hi
I am newbie to blocking a bot. This bot has been gulping more bandwidth on my site. Can anybody please give me instruction on how to block this bot with blocking code for robot.txt
I have two omni bot on my website 
OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Interne 
OmniExplorer_Bot/1.10 (+http://www.omni-explorer.com) Jobs Cr
Thanks in advance.</description>
		<content:encoded><![CDATA[<p>Hi<br />
I am newbie to blocking a bot. This bot has been gulping more bandwidth on my site. Can anybody please give me instruction on how to block this bot with blocking code for robot.txt<br />
I have two omni bot on my website<br />
OmniExplorer_Bot/1.07 (+http://www.omni-explorer.com) Interne<br />
OmniExplorer_Bot/1.10 (+http://www.omni-explorer.com) Jobs Cr<br />
Thanks in advance.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Administrator</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-896</link>
		<pubDate>Sun, 19 Jun 2005 13:17:43 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-896</guid>
					<description>They've just put up a notice on their site. Apologizing for how hungry the bot was, and saying it will obey robots.txt.

There's no word about why it sometimes comes with a user agent that looks like a regular browser.

Frankly, I believe they need to include more than what they have. They need to include the IP range, so we can see who's them and who's a copycat.</description>
		<content:encoded><![CDATA[<p>They&#8217;ve just put up a notice on their site. Apologizing for how hungry the bot was, and saying it will obey robots.txt.</p>
<p>There&#8217;s no word about why it sometimes comes with a user agent that looks like a regular browser.</p>
<p>Frankly, I believe they need to include more than what they have. They need to include the IP range, so we can see who&#8217;s them and who&#8217;s a copycat.
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: chris holland</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-895</link>
		<pubDate>Sat, 18 Jun 2005 22:56:13 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-895</guid>
					<description>ugh sorry i guess the html i had in my post did get literally rendered, what i meant to say about javascript is add the follwing attribute to your "a" link tags: onmouseover="this.href='about:blank';"</description>
		<content:encoded><![CDATA[<p>ugh sorry i guess the html i had in my post did get literally rendered, what i meant to say about javascript is add the follwing attribute to your &#8220;a&#8221; link tags: onmouseover=&#8221;this.href=&#8217;about:blank&#8217;;&#8221;
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: chris holland</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-894</link>
		<pubDate>Sat, 18 Jun 2005 22:54:00 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-894</guid>
					<description>To kinda build on-top of what some other posters mentioned, what would guys think of this automated process:

1) set-up honeypot web pages on your site. Those are pages no valid end-user or valid bot should ever hit.

2) place prominent links to those honeypot pages everywhere on your *legit* pages. But hide those links using CSS after giving them a class attribute. you use display:none; and/or visibility:hidden. You could even use javascript to nullify the urls, so a  honeypot link might look like: &lt;a href="/path/to/honeypot.html" rel="nofollow"&gt;click me&lt;/a&gt; with a css block somewhere that declares: .honeypotStyle {display:none;visibility:hidden;}. I'm not sure whether that'll handle mobile devices too well. You might also check for http headers being sent. A normal browser will send a slew of http headers such as the "Accept:" one. If a bot author is lazy/ignorant, he won't send those.

3) place the honeypot URLs in a robots.txt file at the root of your site, be sure to deny all. From here, all legit bots will ignore your honeypots urls.

4) set-up an automated process by which an IP address gets blacklisted/thoroughly logged with ALL http headers that were sent in the request/reported/let-your-imagination-run-wild as soon as it hits your honeypot url.

Would this work?

-chris</description>
		<content:encoded><![CDATA[<p>To kinda build on-top of what some other posters mentioned, what would guys think of this automated process:</p>
<p>1) set-up honeypot web pages on your site. Those are pages no valid end-user or valid bot should ever hit.</p>
<p>2) place prominent links to those honeypot pages everywhere on your *legit* pages. But hide those links using CSS after giving them a class attribute. you use display:none; and/or visibility:hidden. You could even use javascript to nullify the urls, so a  honeypot link might look like: <a href="/path/to/honeypot.html" rel="nofollow">click me</a> with a css block somewhere that declares: .honeypotStyle {display:none;visibility:hidden;}. I&#8217;m not sure whether that&#8217;ll handle mobile devices too well. You might also check for http headers being sent. A normal browser will send a slew of http headers such as the &#8220;Accept:&#8221; one. If a bot author is lazy/ignorant, he won&#8217;t send those.</p>
<p>3) place the honeypot URLs in a robots.txt file at the root of your site, be sure to deny all. From here, all legit bots will ignore your honeypots urls.</p>
<p>4) set-up an automated process by which an IP address gets blacklisted/thoroughly logged with ALL http headers that were sent in the request/reported/let-your-imagination-run-wild as soon as it hits your honeypot url.</p>
<p>Would this work?</p>
<p>-chris
</p>
]]></content:encoded>
				</item>
	<item>
		<title>by: Thomas Anderson</title>
		<link>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-877</link>
		<pubDate>Wed, 15 Jun 2005 09:09:58 +0000</pubDate>
		<guid>http://spamhuntress.com/2005/05/23/omni-explorer-gobbles-300-megabytes/#comment-877</guid>
					<description>This list may be easier to read than the last:

Denied IP Listing and Netmask for Hurricane Electric
216.218.130.136			255.255.255.248
216.218.128.0			255.255.128
64.71.128.0			255.255.192.0
65.19.128.0			255.255.192.0
66.220.0.0			255.255.224.0
209.51.160.0			255.255.224.0
216.66.0.0			255.255.224.0
64.62.128.0			255.255.128
66.160.128.0			255.255.128
216.218.130.128			255.255.255.248
64.71.191.56			255.255.255.248
66.220.4.240			255.255.255.240
216.218.229.80			255.255.255.240
216.218.158.16			255.255.255.240</description>
		<content:encoded><![CDATA[<p>This list may be easier to read than the last:</p>
<p>Denied IP Listing and Netmask for Hurricane Electric<br />
216.218.130.136			255.255.255.248<br />
216.218.128.0			255.255.128<br />
64.71.128.0			255.255.192.0<br />
65.19.128.0			255.255.192.0<br />
66.220.0.0			255.255.224.0<br />
209.51.160.0			255.255.224.0<br />
216.66.0.0			255.255.224.0<br />
64.62.128.0			255.255.128<br />
66.160.128.0			255.255.128<br />
216.218.130.128			255.255.255.248<br />
64.71.191.56			255.255.255.248<br />
66.220.4.240			255.255.255.240<br />
216.218.229.80			255.255.255.240<br />
216.218.158.16			255.255.255.240
</p>
]]></content:encoded>
				</item>
</channel>
</rss>
