Better handle session expired issue

Bug #1557107 reported by Tony Tan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
taskflow
In Progress
High
Joshua Harlow

Bug Description

After a seemingly bad job fail to be claimed, taskflow worker run into this issue:

https://gist.github.com/tonytan4ever/3827d32d387bb11d01cd

Most important part is:

JobFailure: Claiming failure:
 session expired#012 SessionExpiredError

Seems session to zookeeper expired after a job claim failure. Need a better failure mechanism for this situation.

Joshua Harlow (harlowja)
Changed in taskflow:
importance: Undecided → High
assignee: nobody → Joshua Harlow (harlowja)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to taskflow (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/293075

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/294399

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to taskflow (master)

Reviewed: https://review.openstack.org/293041
Committed: https://git.openstack.org/cgit/openstack/taskflow/commit/?id=67b9e411536317384dc7fa6f83fefca2208f2be9
Submitter: Jenkins
Branch: master

commit 67b9e411536317384dc7fa6f83fefca2208f2be9
Author: Joshua Harlow <email address hidden>
Date: Tue Mar 15 09:52:03 2016 -0700

    Add periodic jobboard refreshing (incase of sync issues)

    Related-Bug: #1557107

    Change-Id: I42672ef63ef02ec5ec6a842d263d0db83d91fe45

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/293075
Committed: https://git.openstack.org/cgit/openstack/taskflow/commit/?id=35a9305f172ed8970caf1ff5cec261df7d3fe9ce
Submitter: Jenkins
Branch: master

commit 35a9305f172ed8970caf1ff5cec261df7d3fe9ce
Author: Joshua Harlow <email address hidden>
Date: Tue Mar 15 10:50:00 2016 -0700

    Ensure the fetching jobs does not fetch anything when in bad state

    When the underlying connection is in LOST or SUSPENDED mode do not
    allow jobs to be iterated over (and clear the local cache when the
    connection has been LOST).

    Related-Bug: #1557107

    Change-Id: Ic0a2ab2519ff8a7386d80d9092a0e24579883681

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/294399
Committed: https://git.openstack.org/cgit/openstack/taskflow/commit/?id=c985dbb63de2b2890d1b4a050171195bbb123771
Submitter: Zuul
Branch: master

commit c985dbb63de2b2890d1b4a050171195bbb123771
Author: Joshua Harlow <email address hidden>
Date: Thu Mar 17 22:48:12 2016 -0700

    Avoid log warning when closing is underway (on purpose)

    Related-Bug: #1557107

    Change-Id: I8b2f327dadbf038cd050f05fbc46a428282a3d82

Ben Nemec (bnemec)
Changed in taskflow:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.