Comment 5 for bug 421901

Revision history for this message
Robert Collins (lifeless) wrote :

Changing the fourth union to be this:

SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released, BugTask.date_incomplete, BugTask.date_inprogress, BugTask.date_left_closed, BugTask.date_left_new, BugTask.date_triaged, BugTask.datecreated, BugTask.distribution, BugTask.distroseries, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.OWNER, BugTask.product, BugTask.productseries, BugTask.sourcepackagename, BugTask.status, BugTask.statusexplanation, BugTask.targetnamecache, BugTask.fti
         FROM BugTask, Bug
         WHERE Bug.id = BugTask.bug
           AND BugTask.status in (10, 15, 20, 21, 22, 25)
           AND Bug.duplicateof IS NULL
           AND BugTask.bug IN
             (SELECT DISTINCT BugMessage.bug
              FROM BugMessage, Message
              WHERE Message.OWNER = 931129
                AND Message.id = BugMessage.message)
           AND Bug.private = FALSE

Should bring the query down to ~5 seconds, a clear improvement. This /changes/ the definition of the fourth union from 'commented on' to 'commented on or created' (because the initial comment creates the bug). However, we're unioning with 'created the bug' anyway, so thats fine (but may need some refactoring to accomodate things that want to query just 'commented on' (but they will need improving similarly to this to work at all - separate problem though :)). Once this is working, we can look into making that 5 seconds subsecond. For posterity - this is the query plan of the forth part of the union - its doing a scan of 290K bug rows to get down to 30K candidates, from there filters through bugmessage and message.

                                                                                 QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop (cost=309637.56..346307.47 rows=200 width=397) (actual time=5407.976..5759.241 rows=3449 loops=1)
   -> Hash Semi Join (cost=309637.56..345044.11 rows=200 width=401) (actual time=5407.772..5722.149 rows=4570 loops=1)
         Hash Cond: (bugtask.bug = bugmessage.bug)
         -> Bitmap Heap Scan on bugtask (cost=3846.05..38339.17 rows=257607 width=397) (actual time=119.834..301.939 rows=289425 loops=1)
               Recheck Cond: (status = ANY ('{10,15,20,21,22,25}'::integer[]))
               -> Bitmap Index Scan on bugtask__status__idx (cost=0.00..3781.64 rows=257607 width=0) (actual time=102.681..102.681 rows=296332 loops=1)
                     Index Cond: (status = ANY ('{10,15,20,21,22,25}'::integer[]))
         -> Hash (cost=304054.32..304054.32 rows=138976 width=4) (actual time=5287.536..5287.536 rows=31907 loops=1)
               -> HashAggregate (cost=301274.80..302664.56 rows=138976 width=4) (actual time=5263.038..5276.298 rows=31907 loops=1)
                     -> Hash Join (cost=202215.04..300927.36 rows=138976 width=4) (actual time=1118.848..5228.975 rows=36602 loops=1)
                           Hash Cond: (bugmessage.message = message.id)
                           -> Seq Scan on bugmessage (cost=0.00..54001.69 rows=3465669 width=8) (actual time=0.012..602.738 rows=3465084 loops=1)
                           -> Hash (cost=200479.04..200479.04 rows=138880 width=4) (actual time=312.647..312.647 rows=140128 loops=1)
                                 -> Bitmap Heap Scan on message (cost=2223.25..200479.04 rows=138880 width=4) (actual time=77.864..233.038 rows=140128 loops=1)
                                       Recheck Cond: (owner = 931129)
                                       -> Bitmap Index Scan on message_owner_idx (cost=0.00..2188.53 rows=138880 width=0) (actual time=50.955..50.955 rows=140128 loops=1)
                                             Index Cond: (owner = 931129)
   -> Index Scan using bug_pkey on bug (cost=0.00..6.30 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=4570)
         Index Cond: (bug.id = bugtask.bug)
         Filter: ((bug.duplicateof IS NULL) AND (NOT bug.private))
 Total runtime: 5761.136 ms

The correlated query:
...

           AND Bug.duplicateof IS NULL
           AND exists
             (SELECT true from
              BugMessage, Message
              WHERE Message.OWNER = 931129
                AND Message.id = BugMessage.message
                and BugMessage.bug = Bugtask.bug)

is slightly (1second) slower on the tests I've done. A possible reason for this is that the total number of uniquely commented on bugs for ~janitor is < 4K, so the overhead of filtering bugmessage down within the set of open bugs isn't actually useful. This may be different for different persons; I'll pick a hugely profilic one and see.