+editproposedmembers times out when working with many candidates

Bug #615654 reported by Curtis Hovey on 2010-08-10
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Edwin Grubbs

Bug Description

https://lp-oops.canonical.com/oops.py/?oopsid=OOPS-1719N1105

SQL time: 5980 ms
Non-sql time: 59441 ms
Total time: 65421 ms
Statement Count: 1005

Quick analysis shows death by a thousand cuts.

The mail sending is certainly an issue but the big problem is lots of itty bitty tasks for the DB.

989. 13977 2ms SQL-launchpad-main-master SELECT %s FROM Person JOIN TeamParticipation ON TeamParticipation.person = Person.id WHERE Person.id = %s AND TeamParticipation.team = %s AND NOT (TeamParticipation.person = %s)
990. 13981 12ms SQL-launchpad-main-master UPDATE TeamMembership SET date_reviewed=%s, last_change_comment=%s, status=%s, date_joined=%s, last_changed_by=%s, date_last_changed=%s, reviewer_comment=%s, reviewed_by=%s WHERE TeamMembership.id = %s
991. 13998 7ms SQL-launchpad-main-master INSERT INTO TeamParticipation (person, team) SELECT 468328, superteam.team FROM TeamParticipation superteam WHERE superteam.person = 3390563 AND NOT EXISTS ( SELECT 1 FROM TeamParticipation WHERE person = 468328 AND team = superteam.team )
992. 14009 4ms SQL-launchpad-main-master SELECT * FROM ((SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, Person.personal_standing, Person.personal_standing_reason, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility FROM Person, TeamMembership WHERE TeamMembership.team = 3390563 AND TeamMembership.status = 3 AND TeamMembership.person = Person.id AND (1=1)) UNION (SELECT Person.account, Person.creation_comment, Person.creation_rationale, Person.datecreated, Person.defaultmembershipperiod, Person.defaultrenewalperiod, Person.displayname, Person.hide_email_addresses, Person.homepage_content, Person.icon, Person.id, Person.logo, Person.mailing_list_auto_subscribe_policy, Person.merged, Person.mugshot, Person.name, Person.personal_standing, Person.personal_standing_reason, Person.registrant, Person.renewal_policy, Person.subscriptionpolicy, Person.teamdescription, Person.teamowner, Person.verbose_bugnotifications, Person.visibility FROM Person WHERE id = 1015)) AS "_a684" ORDER BY person_sort_key(displayname, name)
993. 14015 2ms SQL-launchpad-main-master SELECT EmailAddress.account, EmailAddress.email, EmailAddress.id, EmailAddress.person, EmailAddress.status FROM EmailAddress WHERE EmailAddress.person = %s AND EmailAddress.status = %s ORDER BY EmailAddress.email
994. 14019 3ms SQL-launchpad-main-master SELECT EmailAddress.account, EmailAddress.email, EmailAddress.id, EmailAddress.person, EmailAddress.status FROM EmailAddress WHERE LOWER(EmailAddress.email) = %s ORDER BY EmailAddress.email
995. 14040 0ms sendmail yltsrc approved by sladen
996. 14041 1ms SQL-launchpad-main-master SELECT EmailAddress.account, EmailAddress.email, EmailAddress.id, EmailAddress.person, EmailAddress.status FROM EmailAddress WHERE LOWER(EmailAddress.email) = %s ORDER BY EmailAddress.email
997. 14063 0ms sendmail yltsrc approved by sladen
998. 14064 2ms SQL-launchpad-main-master SELECT EmailAddress.account, EmailAddress.email, EmailAddress.id, EmailAddress.person, EmailAddress.status FROM EmailAddress WHERE LOWER(EmailAddress.email) = %s ORDER BY EmailAddress.email
999. 14084 0ms sendmail yltsrc approved by sladen
1000. 14085 1ms SQL-launchpad-main-master SELECT EmailAddress.account, EmailAddress.email, EmailAddress.id, EmailAddress.person, EmailAddress.status FROM EmailAddress WHERE LOWER(EmailAddress.email) = %s ORDER BY EmailAddress.email

Related branches

Robert Collins (lifeless) wrote :

I agree that it may need batching. However it can probably actually needs to record intent rather than doing the teamparticipation changes directly: even a single team membership change could in principle require hundreds of sql updates to update the cache, and that may be better done in a queue outside the web request.

tags: removed: oops
Curtis Hovey (sinzui) wrote :

This bug relates to bug 353950. Bug 353950 bug is about editing the TeamMembership of a single large team that causes a timeout. This bug is about editing several TeamMemberships that can cause a timeout.

Fixing the issues in the older bug will alleviate some of the issues in this bug.

Changed in launchpad-registry:
assignee: nobody → Edwin Grubbs (edwin-grubbs)
status: Triaged → In Progress
Ursula Junque (ursinha) on 2010-08-18
Changed in launchpad-registry:
status: In Progress → Fix Committed
Ursula Junque (ursinha) on 2010-08-18
Changed in launchpad-registry:
milestone: series-future → 10.09
Curtis Hovey (sinzui) wrote :

I do no think this is fixed yet. I got OOPS-1692S124 using the same team on staging trying to approve 52 members (double the number in the original oops). The branch that landed did imrpove the performance (the companion bug is fixed). The form may be a factor in this problem.

Edwin Grubbs (edwin-grubbs) wrote :

I still haven't made the changes to queue sending email. I probably should have just made this bug and bug 365030 and named the branches part1 and part2.

Curtis Hovey (sinzui) wrote :

We are calling Person.setMembershipData() for each Person.proposedmembers to Approve or Decline. That in turn calls TeamMembership.setStatus() for each update. This method calls flush_database_updates() to ensure the data is pushed back the db, then self._sendStatusChangeNotification() is called.

The email operations, represented by the repeated email address lookups is difficult problem. I noted earlier today that since there is not event/subscribe mechanism for membership status changes, mailing-list-auto-subscriptions are broken in 3 use cases, this use case is broken after sending a some number of emails. We really want to decouple sending the notification from the approve/decline event, or queue a job.

I think we could get all the proposedmembers *and* their memberships is a single call, but in this case we would be dealing with 35 of the 398 queries (17 memberships were not reached at the time of timeout).

We could revise the TeamMembershipSet to do batch changes. In this use case, we want to make two calls to the TeamMembershipSet to change the status of a list of users to approves or declined and pass the reviewer and his message in. If the email is still sent in proc, we need al bulk email sender for each batch call.

tags: added: qa-bad
removed: qa-needstesting
tags: removed: qa-bad
Changed in launchpad-registry:
status: Fix Committed → In Progress

The _all_members code will help I suspect, if refactored to be able to
select on status;

Likewise, moving to set based operations on the candidates will
massively reduce DB calls.

Curtis Hovey (sinzui) on 2010-09-07
Changed in launchpad-registry:
milestone: 10.09 → 10.10
Edwin Grubbs (edwin-grubbs) wrote :

Here is a more recent oops:
OOPS-1719N1105

Robert Collins (lifeless) wrote :

On Thu, Sep 16, 2010 at 2:23 PM, Edwin Grubbs <email address hidden> wrote:
> Here is a more recent oops:
> OOPS-1719N1105

Thanks, I've had a look at it - it looks like a pretty typical untuned
page; simply eager loading and mass updating should make it perform a
whole lot better.

description: updated
Edwin Grubbs (edwin-grubbs) wrote :

On Wed, Sep 15, 2010 at 10:28 PM, Robert Collins
<email address hidden> wrote:
> On Thu, Sep 16, 2010 at 2:23 PM, Edwin Grubbs <email address hidden> wrote:
>> Here is a more recent oops:
>> OOPS-1719N1105
>
> Thanks, I've had a look at it - it looks like a pretty typical untuned
> page; simply eager loading and mass updating should make it perform a
> whole lot better.

While eager loading should be done, there is really no benefit to
processing the emails in the page load. A few minutes delay receiving
emails is not going to negatively affect the user experience as much
as a few seconds delay displaying the page. The number of emails being
sent will increase as teams get bigger, so queuing reduces the
likelihood that we will have to fix this again in the near future.
Eventually, just adding all the participants of large participants of
a new super team will have to be queued also, but hopefully by then,
Launchpad will have established infrastructure to notify users of the
ongoing background processes and their estimated time.

Robert Collins (lifeless) wrote :

Emails are sent after the page is rendered already, though there is a
bug with that.

But yes, I do agree we need to do more; I'm trying to say that we
should optimise the process *anyway* because inefficient code is still
slow on the backend and that will use up DB resources that are the
only unscalable component we have.

Edwin Grubbs (edwin-grubbs) wrote :

On Thu, Sep 16, 2010 at 1:40 PM, Robert Collins
<email address hidden> wrote:
> Emails are sent after the page is rendered already, though there is a
> bug with that.

Emails may be sent after the page is rendered, but gathering all the
information to send the emails is obviously not occurring before the
page times out. How is the page supposed to be rendered before
simple_sendmail() is called? That is what I mean by sending email in
the context of a view class.

> But yes, I do agree we need to do more; I'm trying to say that we
> should optimise the process *anyway* because inefficient code is still
> slow on the backend and that will use up DB resources that are the
> only unscalable component we have.

I think that requires another bug to track that task, since the oops
should be solved before that optimization takes place.

Robert Collins (lifeless) wrote :

On Tue, Sep 21, 2010 at 5:11 AM, Edwin Grubbs <email address hidden> wrote:
> On Thu, Sep 16, 2010 at 1:40 PM, Robert Collins
> <email address hidden> wrote:
>> Emails are sent after the page is rendered already, though there is a
>> bug with that.
>
> Emails may be sent after the page is rendered, but gathering all the
> information to send the emails is obviously not occurring before the
> page times out. How is the page supposed to be rendered before
> simple_sendmail() is called? That is what I mean by sending email in
> the context of a view class.

Sure; but gathering that information is cheap; its only slow here
because the page is inefficient.

>> But yes, I do agree we need to do more; I'm trying to say that we
>> should optimise the process *anyway* because inefficient code is still
>> slow on the backend and that will use up DB resources that are the
>> only unscalable component we have.
>
> I think that requires another bug to track that task, since the oops
> should be solved before that optimization takes place.

Possibly; adding a queue requires waiting 3 more weeks before its
usable. Optimising the existing code can be done in a day. However,
you're the dude doing the work, so its your call.

-Rob

Changed in launchpad-registry:
status: In Progress → Fix Committed
tags: added: qa-ok
removed: qa-needstesting
Curtis Hovey (sinzui) on 2010-10-14
Changed in launchpad-registry:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers