Fullstack looses test workers if eventlet's Timeout is raised

Bug #1625221 reported by Ihar Hrachyshka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Ihar Hrachyshka

Bug Description

Seems like unittest module is not capable to catch the exception, instead killing the whole worker. We need to catch the exception ourselves, if a test case raises it for us. Otherwise we loose some test cases (all of those that were scheduled for the dead worker to execute).

Tags: fullstack
tags: added: fullstack
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/372552

Changed in neutron:
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
status: New → In Progress
Changed in neutron:
milestone: none → newton-rc2
milestone: newton-rc2 → ocata-1
tags: added: newton-rc-potential
Changed in neutron:
importance: Undecided → High
Revision history for this message
IWAMOTO Toshihiro (iwamoto) wrote :

It seems it happens only with neutron jobs.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3Aeventlet.timeout.Timeout

If test runners are killed with unhandled exceptions, I'd expect testr_result or testrepository.subunit.gz to be corrupted, which doesn't seem to be the case.
That leaves big wtf??? in my mind. ;)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/373973

Revision history for this message
IWAMOTO Toshihiro (iwamoto) wrote :

For record,

exceptions handled by testtools are enumerated here, and they haven't changed for years.

https://github.com/testing-cabal/testtools/blob/master/testtools/testcase.py#L270

Unhandled exceptions are reraised from here.

https://github.com/testing-cabal/testtools/blob/master/testtools/runtest.py#L119

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/372552
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e11102e42a26c31ed6606ddf3e7c83cff18dd52b
Submitter: Jenkins
Branch: master

commit e11102e42a26c31ed6606ddf3e7c83cff18dd52b
Author: Ihar Hrachyshka <email address hidden>
Date: Sat Sep 17 01:22:09 2016 +0000

    tests: catch eventlet.Timeout exception

    This exception kills the running test worker with all test cases
    scheduled to it, probably because unittest module is not capable of
    surviving it.

    This patch makes all test cases to catch the exception and convert it to
    unittest' friendly failure mode (triggered with self.fail).

    Change-Id: I5b0d1efa458ca57dfce637dc75d419fe127751ed
    Closes-Bug: #1625221

Changed in neutron:
status: In Progress → Fix Released
tags: removed: newton-rc-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.0.0b1

This issue was fixed in the openstack/neutron 10.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/373973
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Kevin Benton (<email address hidden>) on branch: master
Review: https://review.openstack.org/373973

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/373973
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=42cc227798d912cd1edd254f3ea228bab3225b24
Submitter: Jenkins
Branch: master

commit 42cc227798d912cd1edd254f3ea228bab3225b24
Author: Jakub Libosvar <email address hidden>
Date: Wed Sep 21 05:31:31 2016 -0400

    Change default exception in wait_until_true

    By default, wait_until_true uses default exception from eventlet which
    is eventlet.TimeoutError. This class is not subclass of Exception but
    BaseException. In case wait_until_true times out in any test, the whole
    test executor worker is stopped leaving scheduled tests not executed.
    This patch replaces eventlet.TimeoutError with new WaitTimeout
    exception, that inherits from Exception and thus won't break execution
    of other test cases in case it's raised.

    Related-Bug: 1625221
    Change-Id: I44c0c22f427f61d84963e6e59393b90fbaa8f058

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.