+filebug is timing out when processing large blobs

Bug #357907 reported by Björn Tillenius on 2009-04-08
46
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Unassigned

Bug Description

Sometimes +filebug times out when processing a blob. Almost all time is spent in Python, so it could be that the processing of the blob is inefficient. I haven't looked at exactly where the problem lies.

An example OOPS is OOPS-1194D3715, and the crash report it was processing is here: devpad.canonical.com:oops-1194D3715.crash

In particular this affects automatic crash reports submitted through apport. For example, whenever Firefox crashes with above a certain number of TABs open the crash dump is always big enough to cause a timeout while submitting to launchpad. To properly gauge the impact of this bug, please consider the fact that critical crash bugs in several Ubuntu applications (most notably Firefox) are not being submitted because of this issue. As a workaround, it's possible to select "Reduced crash report" in the apport UI but then I don't the the apport retracers will be able to recover the stacktrace and so it's important that the user experiencing the crash had previously installed the debug symbols for Firefox on this machine.

tags: added: performance timeout
description: updated
Martin Olsson (mnemo) wrote :

I would appreciate if you increased (or at least considered increasing) the importance/priority of this bug.

description: updated
Changed in malone:
status: New → Confirmed
Graham Binns (gmb) on 2009-08-21
Changed in malone:
status: Confirmed → Triaged
importance: Undecided → High
milestone: none → 2.2.8
Matt Zimmerman (mdz) wrote :

This issue all but prevents kernel crash dumps from being submitted to Launchpad

Graham Binns (gmb) wrote :

mdz reports that this is causing problems with the new kernel crash reporting feature in Karmic. See OOPS-1329ED193 and OOPS-1329ED197.

I've targeted this to 2.2.8 since it's pretty high priority, but with the UI push we're having I don't know if it's something that we'll have time to fix immediately.

Graham Binns (gmb) on 2009-08-21
Changed in malone:
assignee: nobody → Graham Binns (gmb)
Deryck Hodge (deryck) on 2009-08-27
Changed in malone:
milestone: 2.2.8 → 3.0
Deryck Hodge (deryck) on 2009-09-15
Changed in malone:
milestone: 3.0 → 3.1.10
Graham Binns (gmb) on 2009-10-20
Changed in malone:
status: Triaged → In Progress
Graham Binns (gmb) wrote :

Moving this back to Triaged for now; will update presently with details of why the bug's occurring and proposed fixes.

Changed in malone:
status: In Progress → Triaged
Graham Binns (gmb) wrote :

We've done a lot of work on investigating this, and have at least found the cause.

The problem is that, when a bug is filed with a BLOB attachment, Launchpad automatically tries to parse the attachment line by line. This is done in a big while loop, and the OOPSes are occuring because the BLOB files from certain projects are large enough (~100MB in some cases) for the request timeout to kick in shortly after this while loop has completed but before the request has completely been dealt with by LP. This threw up a couple of red herrings when we were looking for the problem.

We have a few possible solutions to the problem:

 1. Create a script that processes apport data and make it possible for the +filebug process to tell it "Hey, this LibraryFileAlias is mine, please process it and update this bug appropriately" after the bug has been filed.
 2. Make it so that the apport data get processed before the user is pointed at +filebug, so that the requisite data are available to +filebug as via a series of queries instead of locked away in a BLOB.
 3. A variation on option 1, whereby +filebug will only use the asynchronous method for files over a certain size, e.g. 25MB or so).
 4. Stop parsing once we have the salient data (i.e. bug summary, subscribers, tags, etc.) and finish parsing later.
 5. When the user hits +filebug, a page showing a spinner and telling the user "Processing apport data...". This would fire a job to process the data, which we'd poll periodically. For the non-AJAX use case we'd have to do this using refreshes.

Currently I'm favouring option 5, but we haven't made a decision yet.

Whatever we choose, we currently estimate that the work would take up to a week and a half to complete, so we're hoping to figure out the details of the best solution and start working on it in the first week of the 3.1.11 cycle (wb 9th November 2009).

Changed in malone:
milestone: 3.1.10 → 3.1.11
Deryck Hodge (deryck) wrote :

I realize this has slipped a couple cycles now, but we do now understand the problem and the fix that is needed. I just couldn't get two devs on it this month with a lazr-js sprint and UDS. This needs two devs pairing on it to ensure it is finished and done correctly.

Also, we won't target this for 3.1.12, since that is a two week cycle and holidays abound.

We will, however, do this in January. I'll be extremely disappointed if we don't land a fix for this in January's cycle.

Changed in malone:
milestone: 3.1.11 → none
Graham Binns (gmb) on 2009-12-11
Changed in malone:
assignee: Graham Binns (gmb) → nobody
Graham Binns (gmb) on 2010-01-11
Changed in malone:
assignee: nobody → Graham Binns (gmb)
Graham Binns (gmb) on 2010-01-11
Changed in malone:
status: Triaged → In Progress
Graham Binns (gmb) wrote :

Moving this back to triaged because tracking all the work on this in one bug is impractical.

Instead, we'll consider this bug fixed when the bugs in the story-blob-processing tag have been fixed (http://bugs.launchpad.net/malone/+bugs?field.tag=story-blob-processing). For a description of how we plan to fix this problem, see https://dev.launchpad.net/Bugs/BetterBlobProcessing.

Changed in malone:
assignee: Graham Binns (gmb) → nobody
status: In Progress → Triaged
tags: removed: performance
Stuart Bishop (stub) wrote :

Bug #609564 (Timeout configurable per pageid) would provide a work around for this.

Robert Collins (lifeless) wrote :

All the (http://bugs.launchpad.net/malone/+bugs?field.tag=story-blob-processing) bugs are done except for one totally-cosmetic one; this timeout cannae happen no more.

Changed in malone:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers