MemoryError on update_db script caused by a 196M oops report

Bug #707079 reported by Diogo Matsubara
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-oops-tools
Triaged
Low
Unassigned

Bug Description

The update_db.py script is failing with the following traceback:

Traceback (most recent call last):
 File "bin/update_db", line 41, in <module>
   oopstools.scripts.update_db.main()
 File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/scripts/update_db.py", line 16, in main
   for oops in oops_store.find_oopses(start_date):
 File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/dboopsloader.py", line 113, in find_oopses
   oops = self._load_oops(datedir, filename)
 File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/dboopsloader.py", line 128, in _load_oops
   os.path.join(datedir, filename))
 File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/models.py", line 515, in from_pathname
   data, reqvars, statements, traceback = _parse_msg(msg)
 File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/models.py", line 412, in _parse_msg
   exception_type, msg.getheader('exception-value'), prefix)
 File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/models.py", line 362, in _normalize_exception_value
   evalue = replace_variables(evalue)
 File "/srv/lp-oops.canonical.com/cgi-bin/lpoops/src/oopstools/oops/helpers.py", line 87, in replace_variables
   s = re.sub(r"'(?:\\\\|\\[^\\]|[^'])*'", '$STRING', s)
 File "/usr/lib/python2.6/re.py", line 151, in sub
   return _compile(pattern, 0).sub(repl, string, count)
MemoryError

This is caused by an 196M OOPS report which contains a huge SQL statement (The oops file can be found at: devpad:/srv/launchpad.net-logs/production/wampee/2011-01-24/50402.K967)

One workaround is to move that oops out of the way so update_db can continue to do its job.

description: updated
Changed in oops-tools:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Robert Collins (lifeless) wrote :

what does that re expect to replace? If its expecting e.g. per-line, then running it on each line of the sql might avoid whatever pathological behaviour is occuring.

Another possibility is that the oops updater isn't very efficient on memory and this OOPS report simply shows that up. How much memory do we typically use per OOPS, and do we free it all after each OOPS?

Revision history for this message
Diogo Matsubara (matsubara) wrote :

I talked to Robert about this bug on IRC. He helped me a bit to optimize the regex code, but given the oops file is a massive 196M, even with the optimization we still got the MemoryError. I filed RT #43600 to get this dealt with in the short term by moving the oops file from the oops directory and re-enable the update_db script.

The log of the conversation I had with Rob can be found here: https://pastebin.canonical.com/42245/

Revision history for this message
Robert Collins (lifeless) wrote :

12:58 < lifeless> matsubara: oh
12:58 < lifeless> matsubara: I have an idea
12:58 < lifeless> matsubara: what if... that 196M sql is becoming 800M unicode ?
12:58 < lifeless> matsubara: can you get the type of the 's' in that function ?
12:59 * lifeless adds to the bug

Revision history for this message
Diogo Matsubara (matsubara) wrote :

In [2]: Oops.from_pathname('/var/tmp/lperr/2011-01-24/50402.K967')
> /home/matsubara/devel/canonical/oops-tools/trunk/src/oopstools/oops/helpers.py(88)replace_variables()
-> s = re.sub(r"'(?:\\\\|\\[^\\]|[^'])*'", '$STRING', s)
(Pdb) type(s)
<type 'str'>
(Pdb) !s[:900]
'Statement: "SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released, BugTask.date_incomplete, BugTask.date_inprogress, BugTask.date_left_closed, BugTask.date_left_new, BugTask.date_triaged, BugTask.datecreated, BugTask.distribution, BugTask.distroseries, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.owner, BugTask.product, BugTask.productseries, BugTask.sourcepackagename, BugTask.status, BugTask.statusexplanation, BugTask.targetnamecache FROM BugTask, Bug WHERE Bug.id = BugTask.bug AND BugTask.sourcepackagename = 57520 AND BugTask.distribution = 1 AND Bug.fti @@ ftq(\'(0&00&0000&00000000&00001000&00002000&00003000&00004000&00005000&00006000&00007000&00008000&00009000&0000a000&0000b000&0000c000&0000d000&0000e000&0000f000&00012000&00013000&00014000&00015000&'

Revision history for this message
Diogo Matsubara (matsubara) wrote :

Setting to High since the workaround has been completed in RT #43600 and the update_db script is now unblocked.

Changed in oops-tools:
importance: Critical → High
affects: oops-tools → python-oops-tools
Revision history for this message
Robert Collins (lifeless) wrote :

pretty much a foot-gun, for now.

Changed in python-oops-tools:
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.