Death row processing is currently taking a very long time

Bug #430552 reported by Julian Edwards
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Celso Providelo

Bug Description

Currently, process-death-row.py is taking in excess of three hours to run. This causes the PG transaction killer to kick in which makes the script bail out early.

Because it takes so long, the reaping done via the loop tuner also conflicts with the next run of the publisher which is the possible cause for an exception. We need to:
 a) find out why it takes so long
 b) look at optimising the file deletion stage
 c) stop the publisher temporarily and let it run to completion

Related branches

tags: added: oops soyuz-publish
Changed in soyuz:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Julian Edwards (julian-edwards)
Revision history for this message
Julian Edwards (julian-edwards) wrote :

It turns out that the txn killer was only warning, not killing, so the exception is mostly likely to be the reason for the p-d-r script not completing. cjwatson advised that we can stop the publisher tomorrow to let p-d-r run to completion with no interference.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

The failure is a genuine exception, which still happens when the publisher is stopped when p-d-r runs.

Traceback (most recent call last):
  File "/srv/launchpad.net/codelines/soyuz-production-rev-8337/lib/lp/soyuz/scripts/processdeathrow.py", line 72, in processDeathRow
    death_row.reap(self.options.dry_run)
  File "/srv/launchpad.net/codelines/soyuz-production-rev-8337/lib/lp/archivepublisher/deathrow.py", line 97, in reap
    records = self._tryRemovingFromDisk(source_files, binary_files)
  File "/srv/launchpad.net/codelines/soyuz-production-rev-8337/lib/lp/archivepublisher/deathrow.py", line 293, in _tryRemovingFromDisk
    bytes += self._removeFile(
  File "/srv/launchpad.net/codelines/soyuz-production-rev-8337/lib/lp/archivepublisher/diskpool.py", line 449, in removeFile
    return entry.removeFile(component)
  File "/srv/launchpad.net/codelines/soyuz-production-rev-8337/lib/lp/archivepublisher/diskpool.py", line 245, in removeFile
    assert component == self.file_component
AssertionError

Changed in soyuz:
assignee: Julian Edwards (julian-edwards) → Celso Providelo (cprov)
milestone: none → 3.0
Revision history for this message
Julian Edwards (julian-edwards) wrote :

We're running a dry-run with a cowboyed change in the script to help identify the problematic file. Once that's done, we can probably repair the archive so subsequent runs will work.

Changed in soyuz:
milestone: 3.0 → 3.1.10
Celso Providelo (cprov)
Changed in soyuz:
status: Triaged → In Progress
Revision history for this message
Diogo Matsubara (matsubara) wrote : Bug fixed by a commit
Changed in soyuz:
status: In Progress → Fix Committed
Celso Providelo (cprov)
Changed in soyuz:
milestone: 3.1.10 → 3.0
Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.