Time-out problem at merging accounts (LoginToken:+accountmerge)

Bug #104088 reported by Yannig MARCHEGAY (Kokoyaya)
4
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

I have a problem merging two accounts at Launchpad: https://launchpad.net/token/zXfc0X9CXBf3vB0TXCTj/+accountmerge (time-out). I've tried to merge yannick-marchegay and yannig several times for 4-5 weeks and no way :(

Current case examplar: OOPS-1676A1281

Revision history for this message
Stuart Bishop (stub) wrote :

Reproduced - OOPS-462D1024

Changed in launchpad:
status: Unconfirmed → Confirmed
Revision history for this message
Yannig MARCHEGAY (Kokoyaya) (yannick-marchegay) wrote :

Error ID for last trial: OOPS-462A1082

Revision history for this message
Stuart Bishop (stub) wrote :

30,000 odd rows needed to be updated on the POSubmission table and this took around 90 seconds to run manually. May have been contention with the poimport process. If this becomes a regular problem, we could:
 - trap timeout errors, opening a support request of some sort.
 - increase the timeout for this function alone
 - move to an async merge system

This particular case has been handled.

Changed in launchpad:
assignee: nobody → stub
status: Confirmed → Fix Released
Revision history for this message
Curtis Hovey (sinzui) wrote :

I am reopening this because the root cause (the code) is still the issue. OOPS-1676A1281 shows failures in the translation merge step. Failure can happen in any step, and the only recourse for the user to repeat the merge until it succeeds. One user had to do this 27 times.

affects: launchpad-foundations → launchpad-registry
Changed in launchpad-registry:
status: Fix Released → Triaged
importance: Undecided → Low
tags: added: timeout
Changed in launchpad-registry:
assignee: Stuart Bishop (stub) → nobody
visibility: private → public
Revision history for this message
Robert Collins (lifeless) wrote :

High as per zero oops policy

description: updated
Changed in launchpad-registry:
importance: Low → High
Revision history for this message
Curtis Hovey (sinzui) wrote :

I have failed to get agreement to bulld the feature I think is required to commit to fixing this. I treat high as a commit to fix this in 3 months. I really cannot commit anyone on my team to do that. I really do not want to lie to anyone about out commit to fix this issue. How will this be fixed in 3 months? Is there a sql fix planned instead of an async proc? Is someone from another team committing the week(s) to do this?

Curtis Hovey (sinzui)
Changed in launchpad-registry:
milestone: none → series-future
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 104088] Re: Time-out problem at merging accounts

I am diving into timeouts all over the place, and I may well get to
this one. I like the idea of having high == commitment, but that
doesn't seem to fit with the zerooopspolicy which is about making sure
that oops & timeout are consistently at the front of the queue -
unless we drop all other bugs to non-high ?

Curtis Hovey (sinzui)
tags: added: merge-deactivate
removed: timeout
Revision history for this message
Curtis Hovey (sinzui) wrote : Re: Time-out problem at merging accounts

My first concern is setting expectation about when this will be fixes. Regardless of the priority, this gets set to the series milestone to indicate that it is being watched and I am working to fit it into the schedule. I did add it to the kanboard, but I personally feel that I am being dishonest because I do not think any engineer can pull this from the queue and start working on it.

Well engineers can and do take bugs like this and work on the small part of the code in the oops, and that makes me think we will do a minior fix for this that just reduce the frequency of the issue. I have convinced myself that finding a systemic fix (an async op) can address a lot of bugs see the bug-tag merge-deactivate.

tags: added: timeout
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 104088] Re: Time-out problem at merging accounts

What things are blocking an engineer working on a systemic fix?

Revision history for this message
Stuart Bishop (stub) wrote : Re: Time-out problem at merging accounts

We cannot fix this at the database level. There will always be some user with too many records to update in transaction no matter what timeout we select. The process needs to be moved out of band.

Revision history for this message
Curtis Hovey (sinzui) wrote :

This is a large amount of work, the jobs, emails to users, the bugs about what merge is not doing right. Undertaking this is feature level work. It is removing an engineer from other features for a release. The registry team is still working on briding-the-gap while stakeholders and the rest of the launchpad team ask why the new privacy has not started.

Revision history for this message
Francis J. Lacoste (flacoste) wrote : Re: [Bug 104088] Re: Time-out problem at merging accounts

On August 4, 2010, Robert Collins wrote:
> I am diving into timeouts all over the place, and I may well get to
> this one. I like the idea of having high == commitment, but that
> doesn't seem to fit with the zerooopspolicy which is about making sure
> that oops & timeout are consistently at the front of the queue -
> unless we drop all other bugs to non-high ?

They are, but that's not the only timeout OOPS registry has. Given the state
of the queue and Registry cycle time, Curtis has doubts about his capacity to
fix that within his 'High' are founded.

--
Francis J. Lacoste
<email address hidden>

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, Aug 6, 2010 at 12:44 AM, Curtis Hovey
<email address hidden> wrote:
> This is a large amount of work, the jobs, emails to users, the bugs
> about what merge is not doing right. Undertaking this is feature level
> work. It is removing an engineer from other features for a release. The
> registry team is still working on briding-the-gap while stakeholders and
> the rest of the launchpad team ask why the new privacy has not started.

So (paraphrasing) - its not a small fix, and engineers can't simply
take (say) 2 weeks to put all the pieces into place. Further to that,
this isn't amenable to incremental fixes (because it needs to be done
outside the web transaction and that itself is several days work at
the moment).

So, I'll see what we can do to lower the bar for doing incremental
work of this sort.

Revision history for this message
Curtis Hovey (sinzui) wrote : Re: Time-out problem at merging accounts

That is a fair summary. I pondered landing a code that provided an empty job, then over a few branches, move the existing merge code into it, then land a ui branch to enable it. I could then schedule the many secondary bugs to make merging a complete process. I imagined doing about 2 branches a release until it was complete.

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 104088] Re: Time-out problem at merging accounts

On Fri, Aug 6, 2010 at 3:37 AM, Robert Collins
<email address hidden> wrote:
> On Fri, Aug 6, 2010 at 12:44 AM, Curtis Hovey
> <email address hidden> wrote:
>> This is a large amount of work, the jobs, emails to users, the bugs
>> about what merge is not doing right. Undertaking this is feature level
>> work. It is removing an engineer from other features for a release. The
>> registry team is still working on briding-the-gap while stakeholders and
>> the rest of the launchpad team ask why the new privacy has not started.
>
> So (paraphrasing) - its not a small fix, and engineers can't simply
> take (say) 2 weeks to put all the pieces into place. Further to that,
> this isn't amenable to incremental fixes (because it needs to be done
> outside the web transaction and that itself is several days work at
> the moment).

Moving the existing merge algorithm out of band would be a nice
incremental fix and a suitable test case for the messaging system when
it is available. Instead of calling the merge function and report
'ok', we send a message and report 'pending'. The daemon listening
invokes the existing merge and logs an OOPS if it fails. The load on
messaging will be trivial, and the resulting daemon will be minimal
and provide a template for more advanced uses.

Improving the UI (emails on success, web server polling until job
complete etc.) can be further incremental fixes later.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

summary: - Time-out problem at merging accounts
+ Time-out problem at merging accounts (LoginToken:+accountmerge)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.