BugTask:+index timeout - death by sql in PersonSubscriptions(user, bug)

Bug #811447 reported by Andreas Moog
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Gary Poster

Bug Description

https://bugs.launchpad.net/ubuntu/+source/samba4/+bug/728840 is giving me constant timeouts, example OOPS:

OOPS-2022AU38
OOPS-2022G63
OOPS-2022A66
OOPS-2023AV26
OOPS-2023H30
OOPS-2023D24

I haven't been able to load the bug in a browser at all, accessing the bug via launchpadlib-API is no problem.

 Module canonical.launchpad.webapp.publisher, line 303, in __call__
    self.initialize()
  Module lp.bugs.browser.bug, line 626, in initialize
    psi = PersonSubscriptions(user, bug)
  Module lp.bugs.model.personsubscriptioninfo, line 156, in __init__
    self.loadSubscriptionsFor(person, bug)
  Module lp.bugs.model.personsubscriptioninfo, line 240, in loadSubscriptionsFor
    self._getDirectAndDuplicateSubscriptions(person, bug))
  Module lp.bugs.model.personsubscriptioninfo, line 205, in _getDirectAndDuplicateSubscriptions
    subscriber, subscribed_bug, subscription)
  Module lp.bugs.model.personsubscriptioninfo, line 75, in add
    if principal in self.administrated_teams:
  Module storm.sqlobject, line 552, in __contains__
    return item in result_set

is triggering late lookups many times over.

Tags: qa-ok timeout

Related branches

summary: - Constant Timeouts on bug 728840
+ BugTask:+index timeout - death by sql in PersonSubscriptions(user, bug)
description: updated
tags: added: timeout
Revision history for this message
Martin Pool (mbp) wrote :

Interestingly, and as a partial workaround, this works consistently when not logged in.

Revision history for this message
Robert Collins (lifeless) wrote :

Thats because when you aren't logged in you can't subscribe/unsubscribe others. The page is doing the work to determine that up-front on all page renders.

Aaron Bentley (abentley)
Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Gary Poster (gary) wrote :

This page loads fine and reports "120 queries/external actions issued in 3.55 seconds" for me now, authenticated. I looked for similar OOPS on devpad but didn't find any before I decided to focus on another bug. I'm half-inclined to mark this Fix Released, but I'll try to get someone on the yellow squad to verify that timeouts like this have disappeared.

Changed in launchpad:
assignee: nobody → Данило Шеган (danilo)
status: Triaged → In Progress
Revision history for this message
Andreas Moog (ampelbein) wrote :

Still the same for me, Gary, example OOPS-Codes:

OOPS-2043AS41
OOPS-2043AY36
OOPS-2043CD49

Changed in launchpad:
assignee: Данило Шеган (danilo) → nobody
status: In Progress → Triaged
Revision history for this message
Данило Шеган (danilo) wrote :

I was going to assume this is an issue of cold DB cache, but I actually doubt that now: did you manage to get it loaded at all for yourself, Andreas?

Least we can do is make the two slow, almost identical queries a single one (they differ in DISTINCT presense and what is being selected, but are otherwise compatible): http://pastebin.ubuntu.com/659339/ and http://pastebin.ubuntu.com/659341/

The way it works for me is that the two queries, instead of being 1200+ms each are roughly 150ms each as can be seen in OOPS-2043H61. Identical query when ran on staging took >100s (not ms, actual seconds)! Warm-cache run afterwards took 2.7s.

Revision history for this message
Данило Шеган (danilo) wrote :

To clarify, this query is particularly slow for the given set of circumstances: user being logged in, bug with 450 duplicates, user's team subscribed to 60 of them.

Revision history for this message
Deryck Hodge (deryck) wrote :

Bug 825571, which I duped against this one, has more OOPS linked showing this same problem for another user and a different bug.

Revision history for this message
Paul Sladen (sladen) wrote :

More OOPSen:

  OOPS-2064O33
  OOPS-2064DR16
  OOPS-2064L23

These are generally also bugs that Apport is directing people to (I believe a dialogue box pops up "See see the box that has been loaded in the browser").

Revision history for this message
Gary Poster (gary) wrote :

OK, this bug is a bit of "death by a thousand paper cuts," but looking at the OOPS, it looks like we can cut out a lot of fat with some small changes.

- about 4093 milliseconds can be removed from https://lp-oops.canonical.com/oops.py/?oopsid=2023D24 by cacheing the decision of whether a person is in a bug's administrated teams (the most repeated query in the OOPS analysis) in personsubscriptioninfo.py.

- The longest single query (1269 ms) comes from listifying bug.getSubscribersForPerson(self.user). I'm not sure how to improve that query for this case, but I can look at it.

- The next longest query (1215 ms) comes from getting all the subscriptions in a single big query. This is already optimized from getting the subscriptions singly. As before, I can look at this.

For https://lp-oops.canonical.com/oops.py/?oopsid=2064O33 (from comment 8), the first change will save 3632 ms.

I think I'll stop analyzing and start making this fix (the one in personsubscriptioninfo.py). We can find more optimizations, but this one is such an easy and big win that I should just do it.

Changed in launchpad:
assignee: nobody → Gary Poster (gary)
Gary Poster (gary)
Changed in launchpad:
status: Triaged → In Progress
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Gary Poster (gary)
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.