Archive for the 'Bots' Category

Guestbook spammer’s bot?

Tuesday, April 19th, 2005

Guestbook spam info

I caught an IP address from Atrivo accessing a non-existent page in my guestbook on annelisabeth. Wondered why on earth. But it soon became clear when I checked my logs for that file. It had been accessed three times.

1) Googlebot
Google then put it in the index, even though the page is blank.

2)69.50.176.146
On Atrivo’s network. Which usually means spammer’s lair.
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Opera 7.20 [en]

3)80.227.56.42
On dubaiinternetcity.net’s net, which practically guarantee’s it’s an open proxy
User Agent: Opera 7.51

My guess: The second access was a spambot searching Google for possible guestbook spam targets.

The Atrivo bot returned the next day and downloaded my cat diary category and the January archives over and over.

Faking spambot

Saturday, April 9th, 2005

One of my grep queries filters out known referers and empty referers.

So when one bot fakes the empty referer, it’s included in my results.

One such bot is:
70.84.210.42
42.70-84-210.reverse.theplanet.com
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
It’s a webserver, but no websites recorded.

My guess is this one is associated with the trackback spamrun recently.

The script that does the trackback spamming also has faked empty referrers.

So I’d block that IP address in .htaccess…

PSI bot

Friday, April 8th, 2005

I found a bot, because it had gone for my trackback file, with a GET request. Same thing the search engine spiders do.

IP number:
38.118.25.61

That’s within the Performance Systems International Inc. IP block, but dns servers are from cogentco

User agents:
Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 3.51)
Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 5.0)
Mozilla/4.0 (compatible; MSIE 5.05; Windows NT 4.0)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Webmasterworld has wondered about this one before.

No idea what this one’s actually doing.

Blocking Java suckers

Thursday, March 24th, 2005

There are a number of bots using Sun’s java implementation. I found one of the IP numbers on a list of honeypot trapped IP numbers for e-mail harvesting.

So I’m banning the suckers.

Here’s how you can do it, in .htaccess:

SetEnvIfNoCase User-Agent Java/1.4. spambot=yes
SetEnvIfNoCase User-Agent Java/1.5. spambot=yes
deny from env=spambot

The reason I’m not banning Java and be done with it, is that it might be used for legitimate bots as well. For more background, read the Webmasterworld thread on this.

Update April 18
I found an entry in my log that had been blocked. Not a good thing, because it was a link checker from Dmoz. User agent (in this case)
TulipChain/6.03 (http://ostermiller.org/tulipchain) Java/1.4.2_05 (http://apple.com/) Mac_OS_X/10.3.9
I’ve been put in the bookmarks section of an editor there, so that’s why the link checker came by. I think I need to change the .htaccess. I’ll see what I can figure out.

Nameprotect

Sunday, March 20th, 2005

I’d consider banning this one:

User agent:
NP/0.1 (NP; http://www.nameprotect.com; npbot@nameprotect.com)
IP Adress:
12.175.0.43
24.177.134.6

The bot goes for robots.txt and then the root of the site, and on annelisabeth.com it also went for some of my organizing pages as well as a blog post.

And here’s the marketing blurb on their homepage:

NameProtect® is a Digital Asset Protection company providing eMarket Intelligence to leading corporations. We proactively provide protection of brand assets, recovery of diverted revenues and detection of online identity theft and fraud.

Hmm, I suppose I’ll let it roam on mine, if they actually DO something to all the spammers? But for most sites, a ban is not a bad idea.

Dumb spiders

Sunday, March 20th, 2005

The whitebear address that’s redirecting to me after a spamrun - is banned by Google.

That doesn’t stop Googlebot from trying, and trying, the different URL’s on that domain that were spamvertized. They all meet a 404 on my site.

Both Yahoo’s and Google’s bots keep trying.

Blech…

African sucker

Friday, March 18th, 2005

I had a visit from a bot from the African IP number
81.91.227.195

user agent:
Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt; DTS Agent

He sucked down every file on annelisabeth.com, including all posts on the blog. I’ve got a nice fat 200 K spike on my bandwidth meter from the sucker. 18.67 MB total in less than 20 minutes

But he came in from a cached page on Google, with this search term:
09 tag 2005 update monica hotmail com

And the user agent of his browser was:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)

Ban with extreme prejudice! He’s already accessed spamhuntress, and I’ve got a spike there too, though I don’t know if it’s him - yet.

———–

I’ve had several visits from the same user agent. A few different IP addresses, and mostly accesses to my blog and the About Me page on annelisabeth.com

Bandwidth sucking bot

Wednesday, March 9th, 2005

One IP number was at the top of the list of requests for pages in Awstats.

195.92.228.190
Mozilla/4.0 (compatible; MSIE 4.0; Win32)

It has sucked down 15 megabytes on annelisabeth.com so far this month and 47 megabytes in February. First occurrence was February 4th.

It keeps sucking down the same posts over and over. I’m simply not willing to believe this is a human, even though it loads the css file and an occasional image. Not the background image loaded by the CSS, though.

I guess it resembles the subscription feature in IE a few years ago, where you could download a site for offline browsing, and you could set how often it should check for updated pages?

Kicking myself for not noticing this sooner, but in the .htaccess it goes.

Heh, I should footnote this with the fact that Googlebot has sucked down 24,5 megabyte so far this month, and is nowhere near being banned…

Mystery bot

Wednesday, March 9th, 2005

I just came across this bot in my annelisabeth.com and spamhuntress logs:

Mozilla/4.0 (compatible; MSIE 6.0; Windows XP Professional Bot v.5.)

It sucked down an impressive number of posts March 5th.

IP: 66.250.57.114
colo-66-250-57-114.pilosoft.com

I’m guessing that name means it’s a colocated server?

So why’s it sucking down blog content, including categories, About Me etc?

No referrers, and all but the blog itself were fetched with HEAD.

Automatic blogspot spam

Wednesday, March 9th, 2005

Found through Photo Matt:

Wholelottanothing tells of how a friend was at an anti-spam conference, and checked out a demonstration of a spam tool that included e-mail spamming, capped off by the creation of a spammy blogspot blog.