SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released, BugTask.date_incomplete, BugTask.date_inprogress, BugTask.date_left_closed, BugTask.date_left_new, BugTask.date_triaged, BugTask.datecreated, BugTask.distribution, BugTask.distroseries, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.OWNER, BugTask.product, BugTask.productseries, BugTask.sourcepackagename, BugTask.status, BugTask.statusexplanation, BugTask.targetnamecache, BugTask.fti
FROM BugTask, Bug
WHERE Bug.id = BugTask.bug
AND BugTask.status in (10, 15, 20, 21, 22, 25)
AND Bug.duplicateof IS NULL
AND BugTask.bug IN (SELECT DISTINCT BugMessage.bug
FROM BugMessage, Message
WHERE Message.OWNER = 931129
AND Message.id = BugMessage.message)
AND Bug.private = FALSE
Should bring the query down to ~5 seconds, a clear improvement. This /changes/ the definition of the fourth union from 'commented on' to 'commented on or created' (because the initial comment creates the bug). However, we're unioning with 'created the bug' anyway, so thats fine (but may need some refactoring to accomodate things that want to query just 'commented on' (but they will need improving similarly to this to work at all - separate problem though :)). Once this is working, we can look into making that 5 seconds subsecond. For posterity - this is the query plan of the forth part of the union - its doing a scan of 290K bug rows to get down to 30K candidates, from there filters through bugmessage and message.
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=309637.56..346307.47 rows=200 width=397) (actual time=5407.976..5759.241 rows=3449 loops=1)
-> Hash Semi Join (cost=309637.56..345044.11 rows=200 width=401) (actual time=5407.772..5722.149 rows=4570 loops=1)
Hash Cond: (bugtask.bug = bugmessage.bug)
-> Bitmap Heap Scan on bugtask (cost=3846.05..38339.17 rows=257607 width=397) (actual time=119.834..301.939 rows=289425 loops=1) Recheck Cond: (status = ANY ('{10,15,20,21,22,25}'::integer[]))
-> Bitmap Index Scan on bugtask__status__idx (cost=0.00..3781.64 rows=257607 width=0) (actual time=102.681..102.681 rows=296332 loops=1) Index Cond: (status = ANY ('{10,15,20,21,22,25}'::integer[]))
-> Hash (cost=304054.32..304054.32 rows=138976 width=4) (actual time=5287.536..5287.536 rows=31907 loops=1)
-> HashAggregate (cost=301274.80..302664.56 rows=138976 width=4) (actual time=5263.038..5276.298 rows=31907 loops=1) -> Hash Join (cost=202215.04..300927.36 rows=138976 width=4) (actual time=1118.848..5228.975 rows=36602 loops=1) Hash Cond: (bugmessage.message = message.id) -> Seq Scan on bugmessage (cost=0.00..54001.69 rows=3465669 width=8) (actual time=0.012..602.738 rows=3465084 loops=1) -> Hash (cost=200479.04..200479.04 rows=138880 width=4) (actual time=312.647..312.647 rows=140128 loops=1) -> Bitmap Heap Scan on message (cost=2223.25..200479.04 rows=138880 width=4) (actual time=77.864..233.038 rows=140128 loops=1) Recheck Cond: (owner = 931129) -> Bitmap Index Scan on message_owner_idx (cost=0.00..2188.53 rows=138880 width=0) (actual time=50.955..50.955 rows=140128 loops=1) Index Cond: (owner = 931129)
-> Index Scan using bug_pkey on bug (cost=0.00..6.30 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=4570)
Index Cond: (bug.id = bugtask.bug)
Filter: ((bug.duplicateof IS NULL) AND (NOT bug.private))
Total runtime: 5761.136 ms
The correlated query:
...
AND Bug.duplicateof IS NULL
AND exists (SELECT true from BugMessage, Message
WHERE Message.OWNER = 931129
AND Message.id = BugMessage.message
and BugMessage.bug = Bugtask.bug)
is slightly (1second) slower on the tests I've done. A possible reason for this is that the total number of uniquely commented on bugs for ~janitor is < 4K, so the overhead of filtering bugmessage down within the set of open bugs isn't actually useful. This may be different for different persons; I'll pick a hugely profilic one and see.
Changing the fourth union to be this:
SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask. date_assigned, BugTask. date_closed, BugTask. date_confirmed, BugTask. date_fix_ committed, BugTask. date_fix_ released, BugTask. date_incomplete , BugTask. date_inprogress , BugTask. date_left_ closed, BugTask. date_left_ new, BugTask. date_triaged, BugTask. datecreated, BugTask. distribution, BugTask. distroseries, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.OWNER, BugTask.product, BugTask. productseries, BugTask. sourcepackagena me, BugTask.status, BugTask. statusexplanati on, BugTask. targetnamecache , BugTask.fti
( SELECT DISTINCT BugMessage.bug
FROM BugTask, Bug
WHERE Bug.id = BugTask.bug
AND BugTask.status in (10, 15, 20, 21, 22, 25)
AND Bug.duplicateof IS NULL
AND BugTask.bug IN
FROM BugMessage, Message
WHERE Message.OWNER = 931129
AND Message.id = BugMessage.message)
AND Bug.private = FALSE
Should bring the query down to ~5 seconds, a clear improvement. This /changes/ the definition of the fourth union from 'commented on' to 'commented on or created' (because the initial comment creates the bug). However, we're unioning with 'created the bug' anyway, so thats fine (but may need some refactoring to accomodate things that want to query just 'commented on' (but they will need improving similarly to this to work at all - separate problem though :)). Once this is working, we can look into making that 5 seconds subsecond. For posterity - this is the query plan of the forth part of the union - its doing a scan of 290K bug rows to get down to 30K candidates, from there filters through bugmessage and message.
-------
Nested Loop (cost=309637.
-> Hash Semi Join (cost=309637.
Hash Cond: (bugtask.bug = bugmessage.bug)
-> Bitmap Heap Scan on bugtask (cost=3846.
-> Bitmap Index Scan on bugtask_
-> Hash (cost=304054.
-> HashAggregate (cost=301274.
-> Index Scan using bug_pkey on bug (cost=0.00..6.30 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=4570)
Index Cond: (bug.id = bugtask.bug)
Filter: ((bug.duplicateof IS NULL) AND (NOT bug.private))
Total runtime: 5761.136 ms
The correlated query:
...
AND Bug.duplicateof IS NULL
( SELECT true from
BugMessage, Message
AND exists
WHERE Message.OWNER = 931129
AND Message.id = BugMessage.message
and BugMessage.bug = Bugtask.bug)
is slightly (1second) slower on the tests I've done. A possible reason for this is that the total number of uniquely commented on bugs for ~janitor is < 4K, so the overhead of filtering bugmessage down within the set of open bugs isn't actually useful. This may be different for different persons; I'll pick a hugely profilic one and see.