Submitted by : simon at: 2003-10-26T21:32:34+00:00 (13 years ago)
Name :
Category : Severity : Status :
Optional subject :  
Optional comment :

We use many hairy regular expressions that are expensive (ie they do a lot of backtracking, I think). Until now they have seemed to just work, but with increasing numbers of pages and larger pages problems are becoming apparent.

This made #226 on freebsd, python/zope may crash repeatably when browsing diffs or saving certain pages crashes much more likely on freebsd at least, and now that python has been patched we are starting to see more helpful "maximum recursion limit exceeded" messages eg when saving long pages. Collecting these here.

So, all zwiki's regexps should be reviewed with an eye to optimization. Ideas for estimating/measuring/logging the recursion/expensiveness of each one would be very welcome. Most but not all are in Regexps.py.

 2002-12-11T19:26:51 ERROR(200) SiteError  http://zwiki.org/GeneralDiscussion200209/PUT
 Traceback (innermost last):
  Module ZPublisher.Publish, line 98, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 39, in call_object
  Module Products.ZWiki.ZWikiPage, line 2650, in PUT
  Module Products.ZWiki.ZWikiPage, line 2109, in edit
  Module Products.ZWiki.ZWikiPage, line 2168, in _handleEditText
  Module Products.ZWiki.ZWikiPage, line 2483, in _cleanupText
  Module sre, line 63, in sub
  Module sre, line 164, in _sub
  Module sre, line 179, in _subn
 RuntimeError: maximum recursion limit exceeded

 2002-12-11T19:29:23 ERROR(200) SiteError http://zwiki.org/GeneralDiscussion200210/PUT
 Traceback (innermost last):
  Module ZPublisher.Publish, line 98, in publish
  Module ZPublisher.mapply, line 88, in mapply
  Module ZPublisher.Publish, line 39, in call_object
  Module Products.ZWiki.ZWikiPage, line 2650, in PUT
  Module Products.ZWiki.ZWikiPage, line 2109, in edit
  Module Products.ZWiki.ZWikiPage, line 2180, in _handleEditText
  Module Products.ZWiki.ZWikiPage, line 2450, in _setText
  Module Products.ZWiki.ZWikiPage, line 260, in _preRender
  Module Products.ZWiki.ZWikiPage, line 232, in _render
  Module Products.ZWiki.ZWikiPage, line 411, in render_stxprelinkdtmlhtml
  Module Products.ZWiki.ZWikiPage, line 1315, in _preLink
  Module Products.ZWiki.Utils, line 115, in withinSgmlOrDtml
  Module Products.ZWiki.Utils, line 129, in sgmlAndDtmlSpansIn
 RuntimeError: maximum recursion limit exceeded

 the ones in diff

DeanGoodmanson, 2003/01/22 07:28 GMT (via web):
additional testimony

Earlier I posted this to the wrong page. :-(

More on crash testimiony from OS X point of view

Last few days I've experienced similar crashes on Zwiki 0.11.0rc1, SPVI Zope 2.5.1 on OS X. Database: 85M, in need of packing to usual ~25M.

Create a page, add a huge chunk of Word Generated HTML html & body tags incl. Later added without and no crash. Browsing through the diff of a 48K page.

property change --SimonMichael, 2003/06/01 22:33 GMT
Severity: serious => normal


comments:

property change --SimonMichael, 2003/07/23 20:16 GMT reply
Category: installation => general

potential pattern sighting note --DeanGoodmanson, 2003/07/24 21:15 GMT reply
When I've seen this, it's been on pages with a lot of sgml, and when processed through a "diff"

... -- 2003/07/26 03:57 GMT reply
I have the problem too. And my page is still not very large. I think this bug is serious for me.

property change --simon, Mon, 25 Aug 2003 06:48:53 +0000 reply
Title: 'IssueNo0395? zwiki's regular expressions may fail with large pages/sites' => 'IssueNo0395? zwiki's regular expressions may fail with large pages'

property change --simon, Mon, 25 Aug 2003 07:01:43 +0000 reply
Title: 'IssueNo0395? zwiki's regular expressions may fail with large pages' => 'IssueNo0395? zwiki's regular expressions may fail with large pages/sites'

shot in the dark -- Tue, 23 Sep 2003 11:17:27 -0700 reply
Might something like this improve the memory bloat? http://simon.incutio.com/archive/2003/09/17/sexeger

shot in the dark --SimonMichael, Tue, 23 Sep 2003 13:24:55 -0700 reply
That's wild! Thanks for the link.

Arg! --DeanG, Fri, 10 Oct 2003 09:00:13 -0700 reply
This issue is getting increasingly frustrating in my world. Primarily in the word of "diff". Are there diff algorithms we can steal from other GPL Wiki's?

New Insight! --DeanGoodmanson, Tue, 28 Oct 2003 21:14:12 -0800 reply
I created DiffTests to try to recreate large page diff problems I've seeing.

Here's the interesting thing: I get the Zope restart when /diff'ing against the a large page vs. nothing, for instance just after a pack

Workarounds: Disable /diff when there are no changes in history.

diff has been fixed --simon, Wed, 18 Feb 2004 16:44:08 -0800 reply
And I rarely see this issue elsewhere.. still it would be nice if this was solved so I'm leaving it open.

related to HTML and BODY tag stripping? -- Wed, 22 Sep 2004 14:30:37 -0700 reply
I saw this today on a 0.17.0 version with a large page containing a lot of html tags, and most noteabley a HTML and BODY tag surroudnign the document. Crashed Zope on save, not when saved without HTML or body tag.

Digging through code I found the regex for "strip html & body added by some zope versions" which, although moved to stx.py, hasn't changed between versions.

Tried to reproduce on DiffTests, but notice there's new handling for htis, in such that the html and body tags I attached got removed and such. (at least the end tags) Some of the versions didn't seem to show up in the history.