Mediawiki indexing problems

I’m using MediaWiki on my site. I like it a lot, and I was resting easy, assured that all outgoing links had nofollow on them.

So I’ve been wondering for some time about spammers, and why they bother with spamming it.

I think I may have found out why.

RSS feeds.

Both the atom and RSS feeds of RecentChanges are being indexed by Google. Not good. Although the links don’t actually work on those feeds, I can still find the spammy buzzwords doing a search for them via google with
site:spamhuntress.com

Some spammers are smart, but many are just using tools, spraying and praying, and don’t have a clue about nofollow or other sticky points. So figuring out exactly what the Mediawiki spammers are THINKING, is probably futile.

But the MediaWiki developers need to fix this. They need to put a nofollow on those links, and some others that Joe found. Joe, can we get a comment with your findings?

3 Responses to “Mediawiki indexing problems”

  1. Joe Says:

    Since I started playing with MediaWiki 1.5 recently I have found some things like this myself. There are a few pages that have no point being indexed that are. They aren’t attract spammers or any kind of vulnerability, but I see no reason for these to be indexed:

    Special:Specialpages
    Special:Random

    I notice the printable pages don’t have any noindex in their headers but I don’t see them in search engines. I guess Google is smart enough not to include things with print arguments.

    I wonder if it is allowed to put rel=nofollow on XML feeds. XML is more strict than HTML. If the link to the RSS and Atom feeds had nofollow on them in the first place they would be less likely to show up in search engines.

  2. Brion Vibber Says:

    There aren’t any clickable links in the RSS feeds. Page text just renders as raw source text in raw source view (new pages) or diff tables. Or… I’m not sure I follow?

  3. Administrator Says:

    True, there are no clickable links. But at this point, I don’t think that even matters to spammers. I don’t think they actually distinguish between clickable and non-clickable at this time.

    So that feature on Mediawiki is still a spammer magnet, would be my guess.

    On another note, I have virtually no wiki spam since I included the regex that outlaws the typical “invisible” wiki spam. And I protected the default pages against edits. That helped too.

Leave a Reply