bzr imports are sometimes stuck in 'running' state
Bug #434192 reported by
Данило Шеган
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Jeroen T. Vermeulen |
Bug Description
Sometimes, rosettabranches.py script get stuck in the 'running' state and future runs can't take off then. This points at a problem with code job system, but affects us most badly.
Related branches
lp:~thumper/launchpad/job-fail
- Brad Crittenden (community): Approve (release-critical)
- Michael Hudson-Doyle: Approve
-
Diff: 51 lines2 files modifiedlib/lp/services/job/runner.py (+2/-0)
lib/lp/services/job/tests/test_runner.py (+21/-0)
Changed in rosetta: | |
assignee: | nobody → Jeroen T. Vermeulen (jtv) |
description: | updated |
Changed in rosetta: | |
milestone: | 3.0 → 3.1.10 |
Changed in rosetta: | |
assignee: | Jeroen T. Vermeulen (jtv) → Tim Penhey (thumper) |
assignee: | Tim Penhey (thumper) → Jeroen T. Vermeulen (jtv) |
status: | Triaged → Fix Committed |
Changed in rosetta: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
Tim has found the problem. The first thing that happens to a job when it's beginning to be processed is that it's marked as running; the transaction is committed to make this visible.
When a job fails, the transaction is aborted, a new one is implicitly started, and the new job is marked as failed. Then the exception is re-raised. It's caught one call level up, where the error is appended to a list of failures. Then the next job is processed, which as a side effect commits the previous job's failure mark.
But what happens at the end, when there is no next job? I looked at this and stupidly discarded it as something that would have been noticed. At the end, the script registers oopses for any failures. This was failing because of missing configuration values. And at that point there's nothing to catch and handle the exception, so the script borks out. The transaction is never committed, and so that final failure is not recorded.