whoopsie-upload-all has a wrong check for whether the upload is done

Bug #1354318 reported by Steve Langasek
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
apport (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Currently, the way whoopsie-upload-all works is that, whenever a new .crash file is seen, /etc/init/apport-noui.conf triggers to run whoopsie-upload-all from a single-threaded job that iterates through all as-yet-unprocessed .crash files, process them via apport, marks them for upload, and then waits for apport to upload them.

There are several problems with this:

 - The "wait for apport to upload them" is racy. Once a report is submitted, it is legitimate for something to remove all of the related files (.crash, .upload, .uploaded). On our test infrastructure, that has resulted in buggy behavior because the test harness is removing these files /while whoopsie-upload-all is polling/, causing it to poll for a very long time waiting for a file that will never be found and then time out.
 - waiting for apport to upload is pointless anyway for its designed use, since whoopsie-upload-all is called from an upstart job that's inotify-triggered so nothing waits for the return from this command anyway. The utah scripts appear to currently be waiting for it; however this should be non-default, exceptional behavior, because...
 - since apport-noui is not an 'instance' job, as long as the job is running (presumably blocking on files that are never going to show up), *newly* created .crash files will not be processed.
 - the job furthermore is written to assume that it will be called separately for crashes done as different users. However, since whoopsie-upload-all will process all the crashes that it can each time that it runs, there is a race condition whereby a crash written as non-root will be processed by whoopsie-upload-all running as root, resulting in wrong permissions on the related files anyway.

Proposed solution to this:

 - Change apport-noui to be 'instance $MATCH'. This will result in a separate whoopsie-upload-all process spawned each time a new .crash is created.
 - Keep the current semantics of whoopsie-upload-all iterating over *all* .crash files that it finds and submitting them. This ensures that if a previous crash is missed out for any reason, we manage to catch it later.
 - Ensure that process_report() can be safely called from multiple processes at the same time on the same .crash file (by locking with correct semantics and having subsequent calls skip)
 - Handle any necessary file uid changes /within/ whoopsie-upload-all, instead of relying on a racy sudo invocation from the upstart job.
 - Disable the current wait_uploaded() behavior unless a timeout argument is given explicitly; and also fix the behavior to detect when the .upload file has been removed after we started watching.

I believe with those changes, whoopsie-upload-all should give us reliable report processing in all cases.

Related branches

Revision history for this message
Martin Pitt (pitti) wrote :

Quick notice: whoopsie-upload-all was never designed to be called in parallel from an upstart job. It was a script that we want to run at the end of merge proposals etc. in the CI machinery to upload all crashes that occurred during the test run. For that reason it must wait until everything has been uploaded, otherwise the testbed will be torn down and the .crash is lost.

For upstart we could call this with a --timeout=0, so that it behaves asynchronously. If we rely on the upstart job in the CI machinery now, we instead need a new script which does the "wait until everything is uploaded".

Adding flock()ing to the .crash files in process_report() sounds fine to me.

Revision history for this message
Brian Murray (brian-murray) wrote :

whoopsie-upload-all now returns an upload_stamp for already processed crash files (.crash files with a .upload file) in addition to returning an upload_stamp for ones which it just did data collection. Subsequently, one can call whoopsie-upload-all --timeout 300 and wait_uploaded will now wait for very .upload file without a .uploaded file. This is what the CI machinery should run to "wait until everything is uploaded".

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package apport - 2.14.7-0ubuntu3

---------------
apport (2.14.7-0ubuntu3) utopic; urgency=medium

  [ Steve Langasek ]
  * Refactor apport-noui/whoopsie-upload-all to behave more reliably in
    case of overlapping crash processing (LP: #1354318):
    - debian/apport-noui.upstart: refactor to make this an 'instance' job
      for each incoming .crash file, and drop the racy handling of non-root
      .crash files (as well as the unnecessary 'env MATCH' line).
    - data/whoopsie-upload-all: refactor report processing to ensure that
      whoopsie-upload-all can be called multiple times in parallel without
      causing any .crash file to be processed more than once.
    - data/whoopsie-upload-all: handle setting ownership of files in
      process_report() instead of relying on this script being called by a
      particular user.
    - data/whoopsie-upload-all: don't spin in wait_uploaded() watching for
      .uploaded files if the corresponding .upload file has been removed out
      from under us.
    - data/whoopsie-upload-all: by default, return immediately instead of
      waiting to see if whoopsie processes all of the crashes.

  [ Brian Murray ]
  * data/whoopsie-upload-all: indicate that all reports have been uploaded
    even those that were marked for upload earlier.
 -- Brian Murray <email address hidden> Thu, 02 Oct 2014 08:33:49 -0700

Changed in apport (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.