merge-proposal-jobs is "hanging", apparently with nothing to do

Bug #605772 reported by Steve McInerney
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Expired
Critical
Unassigned

Bug Description

merge-proposal-jobs was apparently hung.
no activity showing in top, or when straced. Just stuck in a select().

Logs:
2010-07-15 05:28:14 INFO Running through Twisted.
2010-07-15 05:28:23 DEBUG Removing lock file: /var/lock/launchpad-merge-proposal-jobs.lock
2010-07-15 05:29:03 INFO Creating lockfile: /var/lock/launchpad-merge-proposal-jobs.lock
2010-07-15 05:29:10 INFO Running through Twisted.
2010-07-15 05:29:20 DEBUG Running <UPDATE_PREVIEW_DIFF job for merge 29547 on ~bilalakhtar/ubuntu/maverick/selinux-basics/merge-603595>, lease expires 2010-07-15 05:39:11.125968+00:00
2010-07-15 05:29:38 DEBUG Finished <UPDATE_PREVIEW_DIFF job for merge 29547 on ~bilalakhtar/ubuntu/maverick/selinux-basics/merge-603595>
2010-07-15 05:30:08 INFO Creating lockfile: /var/lock/launchpad-merge-proposal-jobs.lock
2010-07-15 05:30:08 DEBUG Lockfile /var/lock/launchpad-merge-proposal-jobs.lock in use
2010-07-15 05:31:03 INFO Creating lockfile: /var/lock/launchpad-merge-proposal-jobs.lock
2010-07-15 05:31:03 DEBUG Lockfile /var/lock/launchpad-merge-proposal-jobs.lock in use
....repeats...
2010-07-15 06:23:03 INFO Creating lockfile: /var/lock/launchpad-merge-proposal-jobs.lock
2010-07-15 06:23:03 DEBUG Lockfile /var/lock/launchpad-merge-proposal-jobs.lock in use
2010-07-15 06:24:04 INFO Creating lockfile: /var/lock/launchpad-merge-proposal-jobs.lock
2010-07-15 06:24:04 DEBUG Lockfile /var/lock/launchpad-merge-proposal-jobs.lock in use
Unhandled error in Deferred:
Traceback (most recent call last):
Failure: twisted.internet.error.ProcessDone: A process has ended without apparent errors: process finished with exit code 0.
Unhandled error in Deferred:
Traceback (most recent call last):
  File "/home/pqm/for_rollouts/production/eggs/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 371, in _runCallbacks

  File "/home/pqm/for_rollouts/production/eggs/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 594, in _cbDeferred

  File "/home/pqm/for_rollouts/production/eggs/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 280, in callback

  File "/home/pqm/for_rollouts/production/eggs/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 354, in _startRunCallbacks

--- <exception caught here> ---
  File "/home/pqm/for_rollouts/production/eggs/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/defer.py", line 371, in _runCallbacks

  File "/srv/bzrsyncd.launchpad.net/production/launchpad-rev-9521/lib/lp/services/job/runner.py", line 403, in <lambda>
    deferred.addBoth(lambda ignored: reactor.stop())
  File "/home/pqm/for_rollouts/production/eggs/Twisted-10.0.0-py2.5-linux-x86_64.egg/twisted/internet/base.py", line 552, in stop

twisted.internet.error.ReactorNotRunning: Can't stop reactor that isn't running.
2010-07-15 06:24:52 INFO Ran 1 UpdatePreviewDiffJob jobs.
2010-07-15 06:25:01 DEBUG Removing lock file: /var/lock/launchpad-merge-proposal-jobs.lock
2010-07-15 06:25:09 INFO Creating lockfile: /var/lock/launchpad-merge-proposal-jobs.lock
2010-07-15 06:25:25 INFO Running through Twisted.

The crash was when a kill was sent to the process to get things flowing again.

Steve McInerney (spm)
tags: added: canonical-losa-lp
Paul Hummer (rockstar)
Changed in launchpad-code:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Steve McInerney (spm) wrote :

fwiw, seen again just a few minutes ago.

Revision history for this message
Robert Collins (lifeless) wrote :

Please get a python backtrace using gdb, thanks.

Changed in launchpad-code:
status: Triaged → Incomplete
Revision history for this message
Steve McInerney (spm) wrote :

yay. backtrace: https://pastebin.canonical.com/36493/

we do have a full gcore in ~/core.3577.gz on bzrsyncd@loganberry if desirable.

Changed in launchpad-code:
status: Incomplete → New
Aaron Bentley (abentley)
Changed in launchpad-code:
status: New → Triaged
Revision history for this message
Aaron Bentley (abentley) wrote :

Escalating because this causes production issues.

Changed in launchpad:
importance: High → Critical
Revision history for this message
Aaron Bentley (abentley) wrote :

This may not be an issue anymore, due to r13943

Changed in launchpad:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Launchpad itself because there has been no activity for 60 days.]

Changed in launchpad:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.