timeout on bugtask:+index

Bug #607935 reported by Robert Collins on 2010-07-20
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
Robert Collins

Bug Description

     46 https://bugs.launchpad.net/ubuntu/+bug/1/+index (BugTask:+index)
       OOPS-1871A208, OOPS-1871C1576, OOPS-1871C185, OOPS-1871C1851, OOPS-1871C2041
     14 https://launchpad.net/ubuntu/+bug/1/+index (BugTask:+index)
       OOPS-1871C1411, OOPS-1871D1166, OOPS-1871D1924, OOPS-1871D1996, OOPS-1871D287

Branch: launchpad-rev-12351
Revno: 12351
SQL time: 9410 ms
Non-sql time: 7162 ms
Total time: 16572 ms
Statement Count: 234

1 5573.0 1 SQL-launchpad-main-master
SELECT MessageChunk.blob, MessageChunk.content,
                          MessageChunk.id,
                          MessageChunk.message,
                          MessageChunk.sequence,
                          Message.datecreated,
                          Message.id,
                          Message.OWNER, Message.parent,
                                         Message.raw,
                                         Message.rfc822msgid,
                                         Message.subject
FROM BugMessage,
     Message,
     MessageChunk
WHERE Message.id = MessageChunk.message
  AND BugMessage.message = Message.id
  AND BugMessage.bug = $INT
  AND ($INT=$INT)
ORDER BY Message.datecreated,
         Message.id,
         MessageChunk.sequence
2 2246.0 1 SQL-launchpad-main-master
SELECT BugActivity.bug,
       BugActivity.datechanged,
       BugActivity.id,
       BugActivity.message,
       BugActivity.newvalue,
       BugActivity.oldvalue,
       BugActivity.person,
       BugActivity.whatchanged
FROM BugActivity
WHERE BugActivity.bug = %s
ORDER BY BugActivity.id

1 44 89 2 87 SQL-launchpad-main-master
SELECT Person.account,
       Person.creation_comment,
       Person.creation_rationale,
       Person.datecreated,
       Person.defaultmembershipperiod,
       Person.defaultrenewalperiod,
       Person.displayname,
       Person.hide_email_addresses,
       Person.homepage_content,
       Person.icon,
       Person.id,
       Person.logo,
       Person.mailing_list_auto_subscribe_policy,
       Person.merged,
       Person.mugshot,
       Person.name,
       Person.personal_standing,
       Person.personal_standing_reason,
       Person.registrant,
       Person.renewal_policy,
       Person.subscriptionpolicy,
       Person.teamdescription,
       Person.teamowner,
       Person.verbose_bugnotifications,
       Person.visibility
FROM Person
WHERE Person.id = %sLIMIT $INT
2 40 60 1 59 SQL-launchpad-main-master
SELECT TeamParticipation.id,
       TeamParticipation.person,
       TeamParticipation.team
FROM TeamParticipation
WHERE TeamParticipation.person = %s
  AND TeamParticipation.team = %s

Related branches

Robert Collins (lifeless) wrote :

SQL time: 4060 ms
Non-sql time: 11654 ms
Total time: 15714 ms
Statement Count: 367

https://bugs.edge.launchpad.net/ubuntu/+source/acpi-support/+bug/59695/+index

most queries are pretty fast - this is the slowest:
 115.0 1 launchpad-main-slave SELECT MessageChunk.blob, MessageChunk.content, MessageChunk.id, MessageChunk.message, MessageChunk.sequence, Message.datecreated, Message.id, Message.owner, Message.parent, Message.raw, Message.rfc822msgid, Message.subject FROM BugMessage, Message, MessageChunk WHERE Message.id = MessageChunk.message AND BugMessage.message = Message.id AND BugMessage.bug = 59695 AND (1=1) ORDER BY Message.datecreated, Message.id, MessageChunk.sequence

And most of the time is in python - probably a formatter. I suggest getting a kcachegrind profile from staging for that url.

Robert Collins (lifeless) wrote :

Further to that though, 4 seconds of sql time is scary. batching
27 384 14 370 launchpad-main-slave SELECT Person.account, Person.addressline1, Person.addressline2, Person.city, Person.country, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, Person.organization, Person.personal_standing, Person.personal_standing_reason, Person.phone, Person.postcode, Person.province, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility FROM Person WHERE Person.id = %s LIMIT 1
would save 10% of the sql time straight up.

tags: removed: oops
Robert Collins (lifeless) wrote :

I got an oops from staging, with profile, for this.

https://lp-oops.canonical.com/oops.py/?oopsid=1703S298

Some initial analysis: 22% in Storm SQLObjectResultSet __nonzero__ via both lp.bugs.browser.bugtask line 649 and 3241 - 330 calls.

36% in rendering the page template. Some cycles are making it a little tricky to read; the 405 queries (which actually means nearly 700 db statements, due to the way check disconnected works) are my primary concern; I'll try to track those down tomorrow.

Fixed in stable r11482 (http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/11482) by a commit, but not testable.

Changed in malone:
assignee: nobody → Robert Collins (lifeless)
milestone: none → 10.09
tags: added: qa-untestable
Changed in malone:
status: Triaged → Fix Committed
Robert Collins (lifeless) wrote :

My improvements make regular bug pages snappier (about 1.2 seconds faster) but don't help with the performance of really big pages. This will probably drop the oops frequency significantly though.

Changed in malone:
status: Fix Committed → Triaged
tags: removed: qa-untestable
Robert Collins (lifeless) wrote :
Download full text (6.6 KiB)

Current timeout from staging
https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1707S24

(from bug 1)

SQL time: 9513 ms
Non-sql time: 4433 ms
Total time: 13946 ms
Statement Count: 628

There is a bunch of IO in there :
1 4996.0 1 launchpad-main-master SELECT MessageChunk.blob, MessageChunk.content, MessageChunk.id, MessageChunk.message, MessageChunk.sequence, Message.datecreated, Message.id, Message.owner, Message.parent, Message.raw, Message.rfc822msgid, Message.subject FROM BugMessage, Message, MessageChunk WHERE Message.id = MessageChunk.message AND BugMessage.message = Message.id AND BugMessage.bug = 1 AND (1=1) ORDER BY Message.datecreated, Message.id, MessageChunk.sequence

is the main query - getting the messages. I'm surprised its that long, but another attempt didn't even get this far.

repeating, problem queries:
time Average time Saving Database id Statement
1 152 1623 10 1613 launchpad-main-master SELECT %s FROM (SELECT BugSubscription.bug, BugSubscription.bug_notification_level, BugSubscription.date_created, BugSubscription.id, BugSubscription.person, BugSubscription.subscribed_by FROM BugSubscription WHERE BugSubscription.person = %s AND BugSubscription.bug = %s LIMIT 1) AS "_tmp" LIMIT 1
2 48 133 2 131 launchpad-main-master SELECT TeamParticipation.id, TeamParticipation.person, TeamParticipation.team FROM TeamParticipation WHERE TeamParticipation.person = %s AND TeamParticipation.team = %s
3 35 292 8 284 launchpad-main-master SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, Person.personal_standing, Person.personal_standing_reason, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility FROM Person WHERE Person.id = %s LIMIT 1
4 19 158 8 150 launchpad-main-master SELECT Milestone.active, Milestone.codename, Milestone.dateexpected, Milestone.distribution, Milestone.distroseries, Milestone.id, Milestone.name, Milestone.product, Milestone.productseries, Milestone.summary FROM Milestone WHERE Milestone.product = %s AND Milestone.active = %s ORDER BY milestone_sort_key(Milestone.dateexpected, Milestone.name) DESC
5 17 222 13 209 launchpad-main-master SELECT BugTask.assignee, BugTask.bug, BugTask.bugwatch, BugTask.date_assigned, BugTask.date_closed, BugTask.date_confirmed, BugTask.date_fix_committed, BugTask.date_fix_released, BugTask.date_incomplete, BugTask.date_inprogress, BugTask.date_left_closed, BugTask.date_left_new, BugTask.date_triaged, BugTask.datecreated, BugTask.distribution, BugTask.distroseries, BugTask.id, BugTask.importance, BugTask.milestone, BugTask.owner, BugTask.product, BugTask.productseries, BugTask.sourcepackagename, BugTask.status, BugTask.statusexplanation, BugTask.targetnamecache, "_prejoin1".account, "_prejoin1".creation_comment, "_prejoin1".creation_rationale, "_prejoin1".datecreated, "_prejoin1".defa...

Read more...

Changed in malone:
assignee: Robert Collins (lifeless) → nobody
Robert Collins (lifeless) wrote :

Ignore that OOPS; staging isn't running what we thought it was running, this still needs analysis.

Robert Collins (lifeless) wrote :
Download full text (7.9 KiB)

Ok, valid updated staging OOPS:

https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1708S4

https://bugs.staging.launchpad.net/ubuntu/+bug/1/+index

SQL time: 4218 ms
Non-sql time: 6597 ms
Total time: 10815 ms
Statement Count: 345

Slow query (> 1/2 time)
1 2432.0 1 launchpad-main-master SELECT BugActivity.bug, BugActivity.datechanged, BugActivity.id, BugActivity.message, BugActivity.newvalue, BugActivity.oldvalue, BugActivity.person, BugActivity.whatchanged FROM BugActivity WHERE BugActivity.bug = %s ORDER BY BugActivity.id

Variable substitution is stopping correct analysis of repeated statements, but these look like demand-loading to me:

229. 2425 2ms launchpad-main-master SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, Person.personal_standing, Person.personal_standing_reason, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility FROM Person WHERE Person.id = %s LIMIT 1
230. 2428 1ms launchpad-main-master SELECT TeamParticipation.id, TeamParticipation.person, TeamParticipation.team FROM TeamParticipation WHERE TeamParticipation.person = %s AND TeamParticipation.team = %s
231. 2430 2ms launchpad-main-master SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, Person.personal_standing, Person.personal_standing_reason, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility FROM Person WHERE Person.id = %s LIMIT 1
232. 2433 1ms launchpad-main-master SELECT TeamParticipation.id, TeamParticipation.person, TeamParticipation.team FROM TeamParticipation WHERE TeamParticipation.person = %s AND TeamParticipation.team = %s
233. 2458 2ms launchpad-main-master SELECT Milestone.active, Milestone.codename, Milestone.dateexpected, Milestone.distribution, Milestone.distroseries, Milestone.id, Milestone.name, Milestone.product, Milestone.productseries, Milestone.summary FROM Milestone WHERE Milestone.product = %s AND Milestone.active = %s ORDER BY milestone_sort_key(Milestone.dateexpected, Milestone.name) DESC
234. 2464 2ms launchpad-main-master SELECT Product.answers_usage, Product.blueprints_usage, Product.owner, Product.translations_usage, Product.active, Product.autoupdate, Product.bug_reported_acknowledgement, Product.bug_reporting_guidelines, Product.bug_supervisor, Product.bugtracker, Product.date_next_suggest_packaging, Product.datecreated, Product.description, Product.development_focus, Product.displayname, Product.downloadurl, Product.driv...

Read more...

tags: added: pg83
Curtis Hovey (sinzui) on 2010-12-31
Changed in launchpad:
milestone: 10.09 → none
Changed in launchpad:
importance: High → Critical
era (era) wrote :

So is there any hope for a workaround? Could pages with frequent oopses be cached or something?

Curtis Hovey (sinzui) wrote :

We have abandoned the caching approach because it can only reduce the occurrence of the issue. We want to fix the root cause, which is often poor data retrieval.

We are building an index of the messages in bug comments, the index is
indexing about 5000 messages an hour; once it it is built we'll change
our code to keep it up to date on every bug message change, and also
change our code to start querying from that index. This should make a
significant difference.

description: updated
Robert Collins (lifeless) wrote :

so the 5 second query is 20s cold on qastaging, 300ms hot; this may be a cold cache effect.

However, what we want is the first and last 40 comments;

this seems reasonable and is 30ms rather than 300ms hot:

SELECT MessageChunk.blob, MessageChunk.content,
                          MessageChunk.id,
                          MessageChunk.message,
                          MessageChunk.sequence,
                          Message.datecreated,
                          Message.id,
                          Message.OWNER, Message.parent,
                                         Message.raw,
                                         Message.rfc822msgid,
                                         Message.subject
FROM BugMessage,
     Message,
     MessageChunk
WHERE Message.id = MessageChunk.message
  AND BugMessage.message = Message.id
  AND BugMessage.bug = 1
  and (bugmessage.index < 41 or bugmessage.index > (select max(index) -40 from bugmessage where bug=1))
 ORDER BY BugMessage.index, Message.datecreated,
         Message.id,
         MessageChunk.sequence;

Robert Collins (lifeless) wrote :

The code is structured to want two lookups for this, but the overhead is only 10ms, so this should be tolerable.

Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
assignee: nobody → Robert Collins (lifeless)
milestone: none → 11.03
tags: added: qa-needstesting
Changed in launchpad:
status: Triaged → Fix Committed
Robert Collins (lifeless) wrote :

There is more to do here, my patch *has* introduced some lazy evaluation, but it has also reduced the work as intended - so much so that bug 1 renders reliably on qastaging (after dealing with initial cold cache impact on the db). I'm going to follow up and diagnose the lazy evalution, try and drive this lower.

tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant) on 2011-02-18
Changed in launchpad:
status: Fix Committed → Fix Released
Robert Collins (lifeless) wrote :

So this specific part of bugtasks being slow is addressed - we no longer process all those messages in python every time. Theres more to come but other bugs will cover that (like bug 721056).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers