oopses part of large infestations take forever to load

Bug #1394426 reported by Ryan Finnie
2
Affects Status Importance Assigned to Milestone
python-oops-tools
New
Undecided
Unassigned

Bug Description

When trying to load e.g. https://oops.canonical.com/oops/?oopsid=OOPS-7a3f809df4b4a40c7556e42f9b60f35e which is part of an 16801 oops infestation, the page takes a very long time to load (somewhere between 15 minutes and an hour, I didn't time it properly). The SQL query it appears to be stuck on is:

oops_tools=# EXPLAIN SELECT "oops_oops"."id", "oops_oops"."oopsinfestation_id", "oops_oops"."pathname", "oops_oops"."oopsid", "oops_oops"."prefix_id", "oops_oops"."pageid", "oops_oops"."date", "oops_oops"."url", "oops_oops"."http_method", "oops_oops"."duration", "oops_oops"."referrer", "oops_oops"."user_agent", "oops_oops"."most_expensive_statement", "oops_oops"."total_time", "oops_oops"."time_is_estimate", "oops_oops"."informational", "oops_oops"."appinstance_id", "oops_oops"."is_bot", "oops_oops"."is_local_referrer", "oops_oops"."classification_id", "oops_oops"."statements_count" FROM "oops_oops" WHERE "oops_oops"."oopsinfestation_id" = 6944977 ORDER BY "oops_oops"."date" DESC LIMIT 1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit (cost=0.00..2898.40 rows=1 width=322)
   -> Index Scan Backward using oops_oops_date on oops_oops (cost=0.00..38522683.30 rows=13291 width=322)
         Filter: (oopsinfestation_id = 6944977)
(3 rows)

Without the limit, the query appears to load quickly, but with it, it takes forever.

Revision history for this message
Ryan Finnie (fo0bar) wrote :

Appears to be http://stackoverflow.com/questions/21385555/postgresql-query-very-slow-with-limit-1 . Changing "LIMIT 1" to "LIMIT 50" returns results immediately.

Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 1394426] Re: oopses part of large infestations take forever to load

I suggest adding an index on "oops_oops"."oopsinfestation_id

On 20 November 2014 14:19, Ryan Finnie <email address hidden> wrote:
> Appears to be http://stackoverflow.com/questions/21385555/postgresql-
> query-very-slow-with-limit-1 . Changing "LIMIT 1" to "LIMIT 50" returns
> results immediately.
>
> --
> You received this bug notification because you are subscribed to
> Launchpad Suite.
> https://bugs.launchpad.net/bugs/1394426
>
> Title:
> oopses part of large infestations take forever to load
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/python-oops-tools/+bug/1394426/+subscriptions

--
Robert Collins <email address hidden>
Distinguished Technologist
HP Converged Cloud

Revision history for this message
Ryan Finnie (fo0bar) wrote :
Download full text (3.8 KiB)

There is an index on oopsinfestation_id already. The problem seems to be postgres ignores the index if things like LIMIT 1 are in place and it thinks the index is unnecessary for performance. From what I gather from that stackoverflow post, anyway.

Compare the following. LIMIT 1 and LIMIT 3 take ages, LIMIT 50 is almost immediate:

oops_tools=# EXPLAIN SELECT "oops_oops"."id", "oops_oops"."oopsinfestation_id", "oops_oops"."pathname", "oops_oops"."oopsid", "oops_oops"."prefix_id", "oops_oops"."pageid", "oops_oops"."date", "oops_oops"."url", "oops_oops"."http_method", "oops_oops"."duration", "oops_oops"."referrer", "oops_oops"."user_agent", "oops_oops"."most_expensive_statement", "oops_oops"."total_time", "oops_oops"."time_is_estimate", "oops_oops"."informational", "oops_oops"."appinstance_id", "oops_oops"."is_bot", "oops_oops"."is_local_referrer", "oops_oops"."classification_id", "oops_oops"."statements_count" FROM "oops_oops" WHERE "oops_oops"."oopsinfestation_id" = 6944977 ORDER BY "oops_oops"."date" DESC LIMIT 1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit (cost=0.00..2625.76 rows=1 width=322)
   -> Index Scan Backward using oops_oops_date on oops_oops (cost=0.00..35846899.53 rows=13652 width=322)
         Filter: (oopsinfestation_id = 6944977)
(3 rows)

oops_tools=# EXPLAIN SELECT "oops_oops"."id", "oops_oops"."oopsinfestation_id", "oops_oops"."pathname", "oops_oops"."oopsid", "oops_oops"."prefix_id", "oops_oops"."pageid", "oops_oops"."date", "oops_oops"."url", "oops_oops"."http_method", "oops_oops"."duration", "oops_oops"."referrer", "oops_oops"."user_agent", "oops_oops"."most_expensive_statement", "oops_oops"."total_time", "oops_oops"."time_is_estimate", "oops_oops"."informational", "oops_oops"."appinstance_id", "oops_oops"."is_bot", "oops_oops"."is_local_referrer", "oops_oops"."classification_id", "oops_oops"."statements_count" FROM "oops_oops" WHERE "oops_oops"."oopsinfestation_id" = 6944977 ORDER BY "oops_oops"."date" DESC LIMIT 3;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit (cost=0.00..7877.29 rows=3 width=322)
   -> Index Scan Backward using oops_oops_date on oops_oops (cost=0.00..35846899.53 rows=13652 width=322)
         Filter: (oopsinfestation_id = 6944977)
(3 rows)

oops_tools=# EXPLAIN SELECT "oops_oops"."id", "oops_oops"."oopsinfestation_id", "oops_oops"."pathname", "oops_oops"."oopsid", "oops_oops"."prefix_id", "oops_oops"."pageid", "oops_oops"."date", "oops_oops"."url", "oops_oops"."http_method", "oops_oops"."duration", "oops_oops"."referrer", "oops_oops"."user_agent", "oops_oops"."most_expensive_statement", "oops_oops"."total_time", "oops_oops"."time_is_estimate", "oops_oops"."informational", "oops_oops"."appinstance_id", "oops_oops"."is_bot", "oops_oops"."is_local_referrer", "oops_oops"."classification_id", "oops_oops"."statements_count" FROM "oops_oops" ...

Read more...

Revision history for this message
Robert Collins (lifeless) wrote :

Sorry, missed that: the index needs to be on (infestation_id, date)

Revision history for this message
Stuart Bishop (stub) wrote :

PostgreSQL thinks that for an infestation_id this common, it will find the most recent one quicky by walking the date index backwards. This is something that PostgreSQL's query planner often gets wrong.

As Robert says, an index on (infestation_id, date) should give PostgreSQL what it needs to run this query efficiently. If we create this index, we can drop the existing oops_oops_oopsinfestation_id index.

Alternatively, we can also use the optimization fence of the WITH clause to rewrite the query and force it to choose a more stable plan using the existing oopsinfestation_id index :

WITH oops AS (
    SELECT * FROM oops_oops WHERE oops_oops.oopsinfestation_id = 6944977
)
SELECT * FROM oops ORDER BY date DESC LIMIT 1

Revision history for this message
William Grant (wgrant) wrote :

On 20/11/14 22:25, Stuart Bishop wrote:
> PostgreSQL thinks that for an infestation_id this common, it will find
> the most recent one quicky by walking the date index backwards. This is
> something that PostgreSQL's query planner often gets wrong.
>
> As Robert says, an index on (infestation_id, date) should give
> PostgreSQL what it needs to run this query efficiently. If we create
> this index, we can drop the existing oops_oops_oopsinfestation_id index.
>
> Alternatively, we can also use the optimization fence of the WITH clause
> to rewrite the query and force it to choose a more stable plan using the
> existing oopsinfestation_id index :
>
> WITH oops AS (
> SELECT * FROM oops_oops WHERE oops_oops.oopsinfestation_id = 6944977
> )
> SELECT * FROM oops ORDER BY date DESC LIMIT 1

That will perform terribly for very common infestations (eg. Launchpad
timeouts).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.