Racing package copies crash when trying to create duplicate PackageDiffs

Bug #1475358 reported by Colin Watson on 2015-07-16
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
Unassigned

Bug Description

Michael Terry encountered https://oops.canonical.com/?oopsid=OOPS-994992dc17952480fa06369f6ecd9f27 when trying to copy ubuntu-snappy from https://launchpad.net/~snappy-dev/+archive/ubuntu/tools/+packages into https://launchpad.net/~snappy-dev/+archive/ubuntu/snapcraft-daily/+packages for a second series. The traceback is:

  NotOneError: one() used with more than one result available

    Traceback (most recent call last):
  Module lazr.jobrunner.jobrunner, line 194, in runJobHandleError
    self.runJob(job, fallback)
  Module lp.services.job.runner, line 289, in runJob
    super(BaseJobRunner, self).runJob(IRunnableJob(job), fallback)
  Module lazr.jobrunner.jobrunner, line 162, in runJob
    job.run()
  Module lp.soyuz.model.packagecopyjob, line 578, in run
    self.attemptCopy()
  Module lp.soyuz.model.packagecopyjob, line 675, in attemptCopy
    to_spr.requestDiffTo(self.requester, from_spr)
  Module lp.soyuz.model.sourcepackagerelease, line 392, in requestDiffTo
    candidate = self.getDiffTo(to_sourcepackagerelease)
  Module lp.soyuz.model.sourcepackagerelease, line 388, in getDiffTo
    from_source=self, to_source=to_sourcepackagerelease)
  Module storm.sqlobject, line 356, in selectOneBy
    return SQLObjectResultSet(cls, by=kwargs)._one()
  Module storm.sqlobject, line 516, in _one
    return detuplelize(self._result_set.one())
  Module storm.store, line 1164, in one
    raise NotOneError("one() used with more than one result available")
NotOneError: one() used with more than one result available

This is not the only case where that assumption has been violated. On dogfood:

launchpad_dogfood=# SELECT from_source, to_source FROM PackageDiff GROUP BY from_source, to_source HAVING count(*) > 1;
 from_source | to_source
-------------+-----------
     1519830 | 1683639
     1779264 | 2249949
     2019347 | 2136423
     2056344 | 2180826
     2060701 | 2190900
     2135053 | 2294133
     2182041 | 2248667
     2216681 | 2522809
     2221945 | 2225689
     2255693 | 2526395
     2299873 | 2329922
     2347148 | 2463332
     2361222 | 2933815
     2377650 | 2380373
     2382291 | 2398455
     2398455 | 2403606
     2403606 | 2417605
     2426936 | 2491921
     2456157 | 2490192
     2465643 | 2679868
     2473480 | 2506189
     2474803 | 2529556
     2492042 | 2515229
     2494823 | 2573844
     2517088 | 2538292
     2519613 | 2860033
     2558133 | 2564990
     2570147 | 2574992
     2573527 | 2636233
     2595838 | 2661580
     2601601 | 2636312
     2640691 | 2744538
     2653041 | 2773771
     2670797 | 2684107
     2700705 | 2716147
     2707487 | 2802284
     2722048 | 2723896
     2723896 | 2732290
     2724095 | 2738758
     2737304 | 2837172
     2741396 | 2773772
     2745586 | 2757232
     2745631 | 2745586
     2781747 | 2798061
     2786820 | 2816997
     2812617 | 2870643
     2816783 | 2860055
     2822120 | 2873329
     2841275 | 2871595
     2847301 | 2899488
     2853331 | 2868033
     2853331 | 2881763
     2856964 | 2893031
     2872666 | 2875226
(54 rows)

And using that information I tracked down an example on production where you can see two diffs for the same from/to pair:

  https://launchpad.net/~ricotz/+archive/ubuntu/testing/+sourcepub/2698675/+listing-archive-extra

The timestamps of each group from "SELECT * FROM PackageDiff NATURAL JOIN (SELECT from_source, to_source FROM PackageDiff GROUP BY from_source, to_source HAVING count(*) > 1) AS dups ORDER BY from_source, to_source, date_requested;" are all pretty close together: the largest gap I saw was under 0.4 seconds. I think therefore that this is probably a race in SourcePackageRelease.requestDiffTo: it checks whether a candidate has already been requested, but it's perfectly possible for multiple simultaneous transactions to pass this test and create the same diff before committing the transaction. Indeed, for instance, it's not entirely improbable in the case of the same package being copied to multiple series at the same time.

I think we should at least have a unique constraint on from_source/to_source; perhaps the simplest code fix would then be to try store.flush() right after creating the PackageDiff object and check for IntegrityError, which is a pattern we use in a couple of other places. Regardless, once we have a fix in place we should clean up the existing duplicate rows as well.

Related branches

Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → William Grant (wgrant)
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → In Progress
William Grant (wgrant) on 2016-05-06
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant) wrote :

The duplicates have been deleted from production, and a new unique index added to prevent more from showing up. The racing copies will now crash without corrupting the DB, and a simple retry should work.

Changed in launchpad:
assignee: William Grant (wgrant) → nobody
status: In Progress → Triaged
summary: - Package diffs should be unique by from_source/to_source but aren't
+ Racing package copies crash when trying to create duplicate PackageDiffs
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers