Task search for a tag and ordered by id causes lp-oops

Bug #1894196 reported by Stefan Bader
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

OOPS: https://oops.canonical.com/?oopsid=OOPS-bf4a44f32a4a09a9b891d2b2848dc54e

When we do a task search via launchpadlib which looks for a certain tag and asks for ordering by id, we now reliably get an oops (caused by running too long). From memory this did not happen before. We can work around this by changing the query to not request any ordering, which actually is the better approach in our case as the code does not require results in any order (the constraint was added for debugging and then forgotten).

Testcase (example):
valid_states = [
              'New',
              'Confirmed',
              'Triaged',
              'In Progress',
              'Incomplete',
              'Fix Committed',
              'Fix Released',
              'Invalid',
          ]
search_tag = 'kernel-release-tracking-bug-live'

tasks = launchpadservice.launchpad.bugs.searchTasks(tags=search_tag, order_by='id', status=valid_states)

From internal discussions:
What seems to be going on is that the planner doesn't quite realise how selective the tag is: without the order_by="id", it correctly does an index-only scan on bugtag__tag__bug__key and an index scan on bugtaskflat__bug__idx where bug = bugtag.bug; but with order_by="id" (which turns into ORDER BY BugTaskFlat.bug, BugTaskFlat.bugtask), it decides to do an index scan on bugtaskflat__bug__bugtask__idx first which returns a couple of million rows and then merge that with the results of an index-only scan on bugtag__tag__bug__key, which is a much slower strategy.

There are possible restructurings of the query that work (e.g. replacing the tag search clause with BugTaskFlat.bug IN (SELECT BugTag.bug FROM BugTag WHERE BugTag.tag IN (E'kernel-release-tracking-bug-live') OFFSET 0)), but we'll need to make sure that they don't break other things

Revision history for this message
Colin Watson (cjwatson) wrote :
Download full text (10.4 KiB)

Query plan with order_by="id":

launchpad_staging=> EXPLAIN (ANALYZE, BUFFERS) SELECT BugTaskFlat.bugtask
FROM BugTaskFlat
LEFT JOIN Product ON BugTaskFlat.product = Product.id
AND Product.active
WHERE BugTaskFlat.status IN (10,
                             17,
                             20,
                             21,
                             22,
                             25,
                             30,
                             13,
                             14)
  AND BugTaskFlat.duplicateof IS NULL
  AND (BugTaskFlat.product IS NULL
       OR Product.active = TRUE)
  AND EXISTS
    (SELECT 1
     FROM BugTag
     WHERE BugTag.bug = BugTaskFlat.bug
       AND BugTag.tag IN (E'kernel-release-tracking-bug-live'))
  AND (BugTaskFlat.information_type IN (1,
                                        2)
       OR COALESCE((BugTaskFlat.access_grants)&&
                     (SELECT ARRAY_AGG(TeamParticipation.team)
                      FROM TeamParticipation
                      WHERE TeamParticipation.person = 1733822), FALSE)
       OR COALESCE((BugTaskFlat.access_policies)&&
                     (SELECT ARRAY_AGG(AccessPolicyGrant.policy)
                      FROM AccessPolicyGrant
                      JOIN TeamParticipation ON TeamParticipation.team = AccessPolicyGrant.grantee
                      WHERE TeamParticipation.person = 1733822), FALSE))
ORDER BY BugTaskFlat.bug,
         BugTaskFlat.bugtask DESC LIMIT 76
OFFSET 0;
                                                                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=939.41..11674.81 rows=76 width=8) (actual time=7901.839..8206.012 rows=76 loops=1)
   Buffers: shared hit=2079749 read=295069
   InitPlan 1 (returns $0)
     -> Aggregate (cost=25.71..25.72 rows=1 width=32) (actual time=0.453..0.454 rows=1 loops=1)
           Buffers: shared hit=97
           -> Index Scan using teamparticipation_person_idx on teamparticipation (cost=0.43..25.40 rows=126 width=4) (actual time=0.031..0.419 rows=117 loops=1)
                 Index Cond: (person = 1733822)
                 Buffers: shared hit=97
   InitPlan 2 (returns $2)
     -> Aggregate (cost=908.31..908.32 rows=1 width=32) (actual time=3.503..3.503 rows=1 loops=1)
           Buffers: shared hit=1866 read=18
           -> Nested Loop (cost=0.85..908.19 rows=44 width=4) (actual time=0.046..3.128 rows=2133 loops=1)
                 Buffers: shared hit=1866 read=18
                 -> Index Scan using teamparticipation_person_idx on teamparticipation teamparticipation_1 (cost=0.43..25.40 rows=126 width=4) (actual time=0.013..0.087 rows=117 loops=1)
                       Index Cond: (person = 1733822)
                       Buffers: shared hit=97
                 -> Index Only Scan using accesspolicygrant__g...

tags: added: bug-search lp-bugs oops timeout
Changed in launchpad:
status: New → Triaged
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.