Minimizing profile spam

My friend used to have a mod for phpBB that made the memberlist closed for non-members. But I figured if the point was to make forum profile spam less interesting for spammers, a robots text might be enough:

User-agent: *
Disallow: /forums/memberlist.php
Disallow: /forums/profile.php

There’s a lot more you could block, and you can probably find them on the phpBB forum. But this should be enough to stop phpBB forum profile spam.

7 Responses to “Minimizing profile spam”

  1. ThaNerd says:

    Robots.txt is a standard format that has been created to let authors of spidering robots know which URI it is (not) allowed to browse.

    Your web server does not do anything with it (except in rare occasions: some servers have a “plugin” that will read it and enforce that rules defined in it are actually respected, but even i didn’t set it up)

    Now, please consider these two points:
    1. Did you ever see any spammer access /robots.txt ?
    2. try this experiment:
    take a 8 years old kid. Put him in an empty room where everything there is is a jar. On the jar, is a post-it with the words “don’t eat the cookie” written in large, capital red letters. Place 3 cookies in the jar.
    shut the door, and after two hours elapsed, open the jar. Chances are the three cookies have disappeared.

    Experiment number 2 : replace the kid by a spammer, and the jar with a website. Replace the cookies with e-mail addresses. replace the post-it with a few URIs in a robots.txt. Expect the same result as for experiment 1.

  2. Administrator says:

    You’ve completely missed the point.

    The point is that the spammers want bang for their buck. So they look for forums via Google or whatever. Not being able to find the member list on Google will hopefully get them to bypass that forum. But even if they don’t, they won’t get any added page rank, because Google does adhere to robots.txt.

    THAT is the point.

  3. Joe says:

    A little research shows that VPC’s comment is spam, Google shows a lot of referrer spam for his domain. His whole post is just the name of the subdomain he spammed and a copy of your post’s title. Both sneaky and pathetic at the same time.

  4. ThaNerd says:

    To answer your post, i’ll say that:
    1. Google is NOT the only search engine out there. There must probably be some search engines that DON’T CARE of what robots.txt say. I don’t know any, but i’m pretty sure they( the spammers) know one…
    2. The best way to find users’ lists in forums for a target to spam, the most efficient technique would be to target one engine (in this case “phpbb”). Why would you bother searching for “forum profile” when you can search for the phpbb signature that is required to leave as is in the bottom of all pages of your BB ? “Something like “Powered by phpbb” is enough. Filter results, for each answer, try to guess base url of the forum, and let the script batch-subscribe/spam.

    THAT is the point.

    If there are humans who can find your bulletin board, then spammers can find it too! So there’s no point to limiting Google (or any search engine’s bot).

  5. Joe says:

    1. Google is the biggest search engine. It indexes the most pages. It crawls pages frequently. Currently it is the most important search engine to worry about. Google and other legitimate commercial search engines follow robots.txt. Smaller educational or custom engines may not, but they are also not going to be indexing nearly the number of pages a large engine does. Most spammers find sites through Google or other major search engines. If you prevent those search engines from indexing the potential victims, spammers are far less likely to find them. If they can’t find them they can’t spam them. But why would they anyway. The main purpose of almost all web spam is to improve their sites’ PageRank and other engine’s link ranking. If the major search engines have not indexed a page there is no point in spamming it.

    2. There is not so much difference between one forum software and another or even between a forum and a blog comment field or a guestbook. They are all just HTML forms. Few web spammers stick to one target. They are doing this as a business, why would they limit their attacks to one kind of site? Most spammers do not find victim sites by looking for things like “forum profile” or “powered by phpbb,” they find victims by searching for competitor’s or their own URLs. By finding places that have already been spammed, they know the site unlikely to be cleaned. Spamming sites that are constantly cleaned does them no good and wastes time they could be spamming better targets. At some point a spammer obviously reached those sites first through some kind of search like you mentioned, but many spammers likely don’t ever worry about finding their own new targets.

  6. Carbonize says:

    As of build 2.0.13 phpBB has the option to require image verification when a new member registers.

  7. Rob says:

    Ann - I get your point, and it is very valid. Thanks for the tip, I have just started to search on this topic and preventing the spammers URL from getting indexed by Google already provides me with a great amount of satisfaction :) Now All I gotta do is to stop them signing up completely.

Leave a Reply