Aka pagerank spam. This is the phenomenon of spammers posting comments or edits to open wikis, blogs and similar sites, containing one or many links to sites they are trying to search-optimise.
This is likely to affect your zwiki sites if they become popular and they allow editing by the general public. Here are various tactics you the zwiki admin can use when this problem arises. (Features mentioned below may or should be documented also in configuring.)
- manual repair
Keep ahead of it with manual repairs, using "all edits" subscription, many eyeballs, and the diff form's "revert all" button or ZMI history tab when repairs are needed. You'll need to do this more often than you pack your database, otherwise you may not have sufficient edit history to revert a page easily. Of course you can remove the spam links with a fresh edit, but that's more work, especially if links have been added in obscure places.
This can be a practical solution depending on: how many wiki gardeners are available to give it consistent attention, and how much tolerance you and subscribers have for spam incidents. If it's just you doing cleanup, you're responsible for many wikis, and you sometimes get busy with other things - you'll need a more automated solution.
- have an "all edits" subscription going to a mail folder and check it periodically.
- try to revert edits within 24 hours - Zwiki tells search engines not to index pages edited less than 24 hours ago. Allowing link spam to get indexed by search engines may attract more of it.
2006/05: latest Zwiki code has a better revert - it will undo renames, update last editor info and send notification - and two more powerful revert methods:
- SOMEPAGE/revertEditsBy?username=USERNAME - reverts one or more last edits to this page by USERNAME. Useful if someone repeatedly posts spam to one page.
- SOMEPAGE/revertEditsEverywhereBy?username=USERNAME - does the above for every page in the wiki last edited by USERNAME. Useful if someone spams many pages as the same username. Use with care! If there are many large pages, this can take a long time. Monitor progress in event.log, and optionally pass a &batch=N argument to commit more frequently if the transaction has trouble completing.
You can block repeat offenders by adding known spam link patterns to a banned_links lines property on the wiki folder or above. Edits containing any of these patterns will be blocked. Eg, if one line of the banned_links property is "spamsite", then a comment containing "http://spamsite.com/" will be rejected.
Here's a unix command to extract the links from the text of already-spammed pages, which you could use as the basis for new banned_links entries:
egrep --only-matching "http://[[:alnum:]_.-]*" spammed.txt | sort | uniq
It's also possible to set up a spam mail-in address so that any subscriber receiving a mail-out containing spam links can forward it there (reply-and-quote, actually), and all the links in that edit will be added to banned_links.
- There is a similar mechanism called banned_ips to block editors coming from certain IP address blocks. (I think ?) Controlling this with your webserver may be easier. See also BlockList.
- If true, blocks edits from unidentified users, ie anonymous users who have not saved a username cookie in options.
- Blocks edits from unidentified users containing more than the specified number of links to other sites.
- Similar to the above for cookie-identified users.
- This is more drastic: set your wiki's permissions to require a real zope login for comments and edits. This will solve your spam problem. However you lose the benefit of a traditional open wiki with its low barrier to participation. You can set up self-registration, possibly requiring a real email address, eg by using Zwiki and Plone, which may be a suitable compromise.
update --simon, Sat, 08 Sep 2007 09:07:23 -0700 reply
This page needs an update: mention the recent expunge* method additions/renames, use real headings, rename/reparent for better findability.