Block Snoopy

I got thoroughly spidered yesterday, by some unknown entity.

205 MB from 23/Jan/2006:03:28:34 to 23/Jan/2006:11:47:00 -0600

IP number:
83.64.251.92

User agent:
Snoopy v1.2

Which led me to this little project:

SourceForge.net: Snoopy

What’s interesting is that it tried to retrieve pages on this form:

GET /index.php?year=2005&monthnum=07&day=06&name=revenge-referrer-run&page=

It’s a site ripper. But I’m not keen on that kind of inconsiderate ripping, so I’d advocate banning all of snoopy. Not by IP number, but by user agent.

The IP number is revealing by itself, though. It’s some sort of news site in German, owned by someone on Mallorca in Spain. It doesn’t appear to have any incoming links, and the domain name is from December last year. Looks like it’s owned by some SEO types, which makes me all the more suspicious.

Hmm, on further thought, block the IP as well…

14 Responses to “Block Snoopy”

  1. Alden Bates Says:

    Ahhh, Snoopy. I already banned it because someone was using it to try to trackback spam me back in September. The IPs doesn’t match that one though.

  2. Shawn Kerr Says:

    Yeah I’ve had Snoopy banned for a while myself. I never really care what the rhyme or reason is, I ban pretty much all bots, save for a few exceptions. I don’t need everyone and their brother’s bots coming to my sites all the time. I’m pushing about 180 blocked UAs on one site. Give or take a few on the others.

  3. Mark J Says:

    Snoopy isn’t a site ripper… it’s just a PHP class for performing web-browsing tasks. You’re using WordPress for this blog… WordPress uses Snoopy. /wp-includes/class-snoopy.php in 1.5.x branches of WP. It’s a very handy class with many legitimate uses… I wouldn’t encourage people to ban it by user agent.

  4. Administrator Says:

    Oh, but I WOULD!

    200 megabytes isn’t trivial.

    Could you rather tell me exactly what the fallout would be from a snoopy ban, and then let users decide if they want to risk it or not?

  5. Herschel Says:

    There shouldn’t be a huge impact from blocking the user agent. I checked my logs for one of my sites for 2005, and didn’t have a single request from a Snoopy user agent all year.

    One of the nice things about Snoopy is that it’s trivial to change the user agent that it uses. I’d think that a well-behaved Snoopy-using script would have already set the user agent to something that identified what it is.

    On the other hand, it also means it’s easy for the bad scripts to become Internet Explorer.

    And I’ll second the “it’s not a site ripper” comment. Snoppy just makes it easy to request pages. If something’s spidering your site and grabbing tons of pages, that’s not Snoopy’s doing; that’s some other spidering code in there that’s just using Snoopy to fetch each page.

  6. Scott Johnson Says:

    This fool has sucked up about 1.5GB from one of my sites over the course of the last five days. The IP is already banned. Now I’m trying to decide whether to ban the user agent altogether. The way I see it, a simple HTTP class is really all that’s necessary for any legit software. If the software needs to use a package that emulates a browser, it’s probably not legit. And yes, if WordPress uses this, I hate it even more. :)

  7. Lemat Says:

    Should do the trick:

    RewriteCond %{HTTP_USER_AGENT} ^Snoopy [NC]
    RewriteRule .* http://download.microsoft.com/download/1/6/5/165b076b-aaa9-443d-84f0-73cf11fdcdf8/WindowsXP-KB835935-SP2-ENU.exe [R=301]

    - it’s Apache mod_rewite

  8. Jason Pearce Says:

    Lemat, while I appreciate the humor in your RewriteRule, I don’t recommend anyone using it. For starters, it unfairly transfers an attack on one user’s website over to Microsoft, which had nothing to do with it. Now I’m not a Microsoft fan, but they receive enough attacks as it is.

  9. Nigel Horne Says:

    That’s why I wrote mod_spambot!

  10. Dan Gibas Says:

    Hi All,

    It seems I had a lucky escape from Snoopy… The site it entered is all on one PHP page with a dynamic variable in the query string. Snoopy failed to navigate my site and do much at all - so it seems it cant handle the query strings.

    Lucky my site being probed is all on index.php ;o)

    Here is my custom log of the activity… from this you can also see that it was trying to take 1 page per second (which is really mean and could bring some sites down DOS style).

    [871] [Date: Tuesday 29th of August 2006 02:20:54 AM] [IP: 62.141.58.48]
    [UA: Snoopy v1.2]
    [Ref: n/a]
    [Viewed: /index.php?]

    [872] [Date: Tuesday 29th of August 2006 02:20:55 AM] [IP: 62.141.58.48]
    [UA: Snoopy v1.2]
    [Ref: n/a]
    [Viewed: /index.php?]

    [873] [Date: Tuesday 29th of August 2006 02:20:56 AM] [IP: 62.141.58.48]
    [UA: Snoopy v1.2]
    [Ref: n/a]
    [Viewed: /index.php?]

    [874] [Date: Tuesday 29th of August 2006 02:20:56 AM] [IP: 62.141.58.48]
    [UA: Snoopy v1.2]
    [Ref: n/a]
    [Viewed: /index.php?]

    [875] [Date: Tuesday 29th of August 2006 02:20:57 AM] [IP: 62.141.58.48]
    [UA: Snoopy v1.2]
    [Ref: n/a]
    [Viewed: /index.php?]

    [876] [Date: Tuesday 29th of August 2006 02:20:58 AM] [IP: 62.141.58.48]
    [UA: Snoopy v1.2]
    [Ref: n/a]
    [Viewed: /index.php?]

    Final thought…

    Are the developers paying royalty to Snoopy the cartoon character?

    Dan

  11. Static Brain Says:

    Snoopy is now ripping me apart from two ips 62.141.58.48 which someone already mentioned and now this one 87.118.100.27 I already banned it. From what I read on it, it can not be good. Thanks for posting this. Fighting spam is a full time job anymore.

  12. Spamhuntress » Blog Archive » Fighting spam is a full time job Says:

    […] The quote is from a comment on the Block Snoopy post. […]

  13. Clayton Says:

    You do realize you can change the useragent with snoopy so that it looks as if it real browser traffic….

    Snoopy is an amazing class…

  14. micenterprise » Blog Archive » Let’s hunt Snoopy Says:

    […] Also mein Counterize Plugin von meinem Blog führt ja ne simple Statistik welcher Browser, IP, etc da war. Da gabs jetzt immer einen Browser Snoopy. Dachte jetzt, da die Katrin ja Snoopy mag, hat sie vielleicht ihren Browser getuned dass der behauptet ein Snoopy zu sein. Nun ja wie dem so ist, war Snoopy sehr, SEHR häufig da und gestern beim Chatten hab ich mal gefragt ob ich mir Sorgen machen müsste weil sie alle 2h da wäre. Sie meinte, nö, sie hat nen RSS-Feeder und sieht immer wenn was Neues kommt. Hm hab dann mal nach Snoopy gegoogelt und siehe da Snoopy ist ein kleines Programm dass Webseiten zieht. Frech ne? Ok, nen normaler Browser zieht auch Webseiten aber Snoopy macht das automatisch und simuliert nur den Browser um danach irgendwas anderes mit dem Content zu machen. Jetzt werd ich mir überlegen wie man Snoopy blockiert. Bin auch nicht der erste den Snoopy besucht hat. […]

Leave a Reply