I have awstats reports for the site since 2004. Is it safe to link these ?


A zwiki web turns out to be a trap for web crawlers - they get in but they can't get out! On this site the bots have been wandering in the hall of mirrors created by acquisition, mostly receiving 404's & 500's, leading to some very large hit rates. In the 1999/2000 reports roughly 90% of hits are from robots.

In december 2000 this site received 1.5 million hits with a max hit rate of 37000/hr (~10/s)! Happily we didn't notice! Someone deserves praise here.

As of 2000/12 I have excluded /zwiki in http://joyful.com/robots.txt, but I would like to allow search engines to index the site. robots.txt doesn't seem powerful enough to help much. It may be possible to close off the hall of mirrors by careful link control.

I tried to generate a report excluding robots. I like webalizer but it seems the current release can't exclude agents at the same time as limiting the urls to /zwiki. Accurately pre-filtering the logs by hand is tricky.

Am I right in thinking web stats reports are (still) HARD ? Demanding bootloads of time and concentration ?

See also TheRobotProblem

hits per page ? --Wed, 07 Jan 2004 06:22:03 -0800 reply

Can I see the amount of hits per page ? Why offer /contribute quality info if no one see's it !? please reply here or email me to let me know you saw this at abrin40291@yahoo.com ( no spam, please ! )

hits per page ? --Simon Michael, Thu, 08 Jan 2004 02:07:36 -0800 reply

Currently you have to analyse log files to get this (eg http://zwiki.org/logs/zwiki.org/usage_200401.html#TOPURLS ). As far as I know there is no live hit counter add-on for Zope, because the ZODB doesn't handle frequent writes well. Hopefully someone will correct me.

hits per page ? --PieterB, Thu, 08 Jan 2004 02:22:18 -0800 reply

You either need to analyse the log files (see comment Simon), or have to include a link to a picture on a server on which you can access the logfiles. Remember to filter out robots (e.g. by using a log analyzer such as http://awstats.sf.net/ or by using some Javascript/filtering to filter out the robots from searchengines (e.g. GoogleBot?).