branch scanner rlimit failures cause the next branch to be incorrectly scanned and fail

Bug #786804 reported by Jean-Paul Calderone on 2011-05-23
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
Aaron Bentley

Bug Description

r2675 of https://code.launchpad.net/~divmod-dev/divmod.org/trunk was never properly scanned. On #launchpad:

  < wgrant> 2011-05-21 20:24:08 INFO Updating branch scanner status: 2675 revs
  < wgrant> Fatal Python error: deletion of interned StaticTuple failed
  < wgrant> Aborted
  < wgrant> Looks like bzrlib exploded.
  < wgrant> Oh, no, it's actually a MemoryError.
  < wgrant> exarkun: A bug would be good. What happened here is that a kernel branch caused the scanner to hit its rlimit, but scan_branches.py didn't notice, so it continued trying to execute more jobs, and yours was next.

I guess divmod.org/trunk will probably be fixed by the next commit, but it would be good if launchpad handled this case better on its own.
OOPS-1965SMS9

Related branches

Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
tags: added: oops
summary: - branch scanner fails with MemoryError leaving branch page in
- intermediate state
+ branch scanner rlimit failures cause the next branch to be incorrectly
+ scanned and fail
description: updated
Robert Collins (lifeless) wrote :

this is fallout from a change we made to stop things swapping and taking down the machine; the fallout is a regression because previously only the problematic branches got stomped on.

tags: added: regression
tags: removed: regression
tags: added: regression
Martin Pool (mbp) wrote :

The fallout is from bug 690021.

It seems there are a few possibilities:

1- outright revert the imposition of the ulimit
2- make the ulimit come from a feature flag and then configure it to unlimited (probably good anyhow, assuming it's easy to check flags at the time it's needed)
3- make sure it's the problem branch that gets killed, not the following one

I would be inclined to do 2 and then 3.

Something like this could be behind <https://bugs.launchpad.net/bugs/761664> - although not this particular change, because that bug was reported before my ulimit change landed.

Robert Collins (lifeless) wrote :

The ulimit merely changes the failure mode, its a good thing to have. I would focus directly on 3 here.

Aaron Bentley (abentley) wrote :

Since the branch scans are jobs, it seems reasonable to use the TwistedJobRunner to isolate each job as a process, then set the rlimit for the jobs themselves.

Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Aaron Bentley (abentley)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
Aaron Bentley (abentley) on 2011-06-22
tags: added: qa-untestable
removed: qa-needstesting
William Grant (wgrant) on 2011-06-27
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers