I've deleted the testr task: testr is (correctly) waiting for the test workers to finish (http://paste.openstack.org/show/112990/) - we can open a feature request to have a testr supervisor timeout facility, but its orthogonal to this nova issue.
Looking at the test code there are lots of issues that will make this test fragile:
- it assumes 0 delay between /a/ worker being spawned and all of them being spawned
- I was going to say that we don't set the multiprocessing workers 'daemon' flag - https://docs.python.org/2/library/multiprocessing.html and this will cause a parent process that exits to hang [same as with threads] unless the process object is killed out of band. OTOH if we want to guarantee all in-progress requests finish cleanly, this is appropriate - but we have no timeout protection on each request - but I observe that actually we don't even use multiprocessing - we've grown our own thing :(.
- it uses tearDown which is not guaranteed to run if setUp has issues.
The hanging process is from test_restart_sighup.
This test doesn't seem to test what SIGHUP does - SIGHUP doesn't restart any children, and the test doesn't strace the processes to see if SIGHUP is actually propogated to children, so the call to _wait should be a 100% no-op, *except* that we want to know the processes aren't being killed - thats why we cross check the pids. But the use of _wait means we'll exit on the first check - not allowing any time for the service process to do the wrong thing. That is, it can falsely pass.
The service code or handling signals is racy: it sets sigcaught to the received signal but multiple signals may be delivered before any are processed. I don't have reason to believe that that is whats happening here.
This is a worry:
except fixtures.TimeoutException:
we catch it, but we don't re-raise it. So we could 'pass' after being timed out, if it happens in tearDown. That seems wrong to me.
I've deleted the testr task: testr is (correctly) waiting for the test workers to finish (http:// paste.openstack .org/show/ 112990/) - we can open a feature request to have a testr supervisor timeout facility, but its orthogonal to this nova issue.
Sean has grabbed stuff with gdb: paste.openstack .org/show/ 112998/ - this appears to be waiting for a new request paste.openstack .org/show/ 113014/
20727 - a worker we think - http://
20725 - parent of the worker, stuck in the poll hub - http://
Looking at the test code there are lots of issues that will make this test fragile:
- it assumes 0 delay between /a/ worker being spawned and all of them being spawned
- I was going to say that we don't set the multiprocessing workers 'daemon' flag - https:/ /docs.python. org/2/library/ multiprocessing .html and this will cause a parent process that exits to hang [same as with threads] unless the process object is killed out of band. OTOH if we want to guarantee all in-progress requests finish cleanly, this is appropriate - but we have no timeout protection on each request - but I observe that actually we don't even use multiprocessing - we've grown our own thing :(.
- it uses tearDown which is not guaranteed to run if setUp has issues.
The hanging process is from test_restart_ sighup.
This test doesn't seem to test what SIGHUP does - SIGHUP doesn't restart any children, and the test doesn't strace the processes to see if SIGHUP is actually propogated to children, so the call to _wait should be a 100% no-op, *except* that we want to know the processes aren't being killed - thats why we cross check the pids. But the use of _wait means we'll exit on the first check - not allowing any time for the service process to do the wrong thing. That is, it can falsely pass.
The service code or handling signals is racy: it sets sigcaught to the received signal but multiple signals may be delivered before any are processed. I don't have reason to believe that that is whats happening here.
This is a worry: TimeoutExceptio n:
except fixtures.
we catch it, but we don't re-raise it. So we could 'pass' after being timed out, if it happens in tearDown. That seems wrong to me.