Bug #1357578 “Unit test: nova.tests.integrated.test_multiproces...” : Bugs : OpenStack Compute (nova)

Tracy Jones (tjones-i) on 2014-08-27

tags:

added: testing

Revision history for this message

Joe Gordon (jogo) wrote on 2014-08-30:

#1

Elastic recheck patch: https://review.openstack.org/#/c/114680/

Changed in nova:
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-09-06:

#2

Bug 1364494 appears to be related.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-09-07:

#3

Maybe some clues here, i.e. there might be an eventlet.Timeout that's leaking:

http://lists.openstack.org/pipermail/openstack-dev/2014-September/045272.html

Revision history for this message

Jay Pipes (jaypipes) wrote on 2014-09-08:

#4

Matt, I think the eventlet.Timeout thing is a red herring. fixtures.Timeout uses the signal module from the standard Python library to interrupt the process after timeout_secs. It does not use eventlet.Timeout. That said, there are two places I found that are incorrectly using eventlet.timeout.Timeout() without a context manager or calling close():

jaypipes@cranky:~/repos/openstack/nova/nova$ ack-grep "Timeout\("
tests/virt/libvirt/test_driver.py
8925: raise eventlet.timeout.Timeout()

tests/virt/hyperv/test_vmops.py
196: mock_with_timeout.side_effect = etimeout.Timeout()

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-11: Related fix proposed to nova (master)

#5

Related fix proposed to branch: master
Review: https://review.openstack.org/120901

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-11: Change abandoned on nova (master)

#6

Change abandoned by Jay Pipes (<email address hidden>) on branch: master
Review: https://review.openstack.org/120901
Reason: going to split this into log-only and other stuff.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-11: Related fix proposed to nova (master)

#7

Related fix proposed to branch: master
Review: https://review.openstack.org/120909

Revision history for this message

Joe Gordon (jogo) wrote on 2014-09-11:

#8

marked as critical because seeing this in the gate, 7 hits in 24 hours

Changed in nova:
importance:	High → Critical

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-13: Related fix merged to nova (master)

#9

Reviewed: https://review.openstack.org/120909
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=608630b0b27dd13847b1e149541c2e1799500421
Submitter: Jenkins
Branch: master

commit 608630b0b27dd13847b1e149541c2e1799500421
Author: Jay Pipes <email address hidden>
Date: Thu Sep 11 16:59:28 2014 -0400

Adds LOG statements in multiprocess API test

    Renames _terminate_workers() to _wait_for_workers_to_end() to accurately
    match what is done in that method. Adds a bunch of LOG.info() and
    LOG.warn() calls in the various methods in the test to get better
    debugging information when bug # 1357578 rears its ugly head.

Change-Id: I6698b7c71ec651c812138c8dd93b1e1b33ee1178
Related-bug: #1357578

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2014-09-16:

#10

Jay,

would adding a loop with the following help?
os.waitpid(-1, os.WNOHANG)

https://gist.github.com/matthewstory/4547282
https://github.com/kanaka/noVNC/issues/47

thanks,
dims

Revision history for this message

Brent Eagles (beagles) wrote on 2014-09-17:

#11

I'm afraid I don't quite understand the train of thought implied by the discussion about what the actual *issue* is. On the face of it, it simply looks like the test took too long and the stack trace is "interesting" because we just happened to be in the eventlet trampoline code. This doesn't seem too surprising, so is it suspected that there is a worker hung somewhere or some other obvious thing that shouldn't be happening?

Revision history for this message

Brent Eagles (beagles) wrote on 2014-09-17:

#12

Okay, after reviewing the commit messages in the patch(es) and the
comments in the code, it seems pretty obvious. It does appear that the
gist of the bug is simply "this times out but we don't know why at the
moment". Unless the timeout is set wicked short, considering the test a
valid timeout scenario seems pretty improbable. Good test to find
something so weird.

So are these a few of the choices?

- Workers don't shutdown properly when they get the signal.

- Somehow the signal isn't getting to a worker in the first place. At
  the moment, this seems possible only if something were to happen to
  the ProcessLauncher so that it would "forget" about the child/worker
  PID, i.e. leaking a child process. One scenario would be some kind of
  oddness where a worker process caused os.waitpid() to return some kind
  of status for itself in _wait_child(). The logging Jay Pipes added
  will hilight any weirdness like that.

- By some very strange twist, a previous test case didn't cleanup
properly and there is a timeout hanging around to cause trouble? Is
that even possible?

Joe Gordon (jogo) on 2014-09-18

Changed in nova:
milestone:	none → juno-rc1

Revision history for this message

Sean Dague (sdague) wrote on 2014-09-18:

#13

I managed to get a reproduce by creating a slow vm: ubuntu 14.04 in vbox, 1 G ram, 2 vcpu, set to 50% of cpu performance.

tox -epy27 -- --until-fail multiprocess

On the 3rd time through I got the following:

running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \
${PYTHON:-python} -m subunit.run discover -t ./ ./nova/tests --load-list /tmp/tmp7YX5uh
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \
${PYTHON:-python} -m subunit.run discover -t ./ ./nova/tests --load-list /tmp/tmpU5Qsw_
{0} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_killed_worker_recover [5.688803s] ... ok

Captured stderr:
~~~~~~~~~~~~~~~~
/home/sdague/nova/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/sql/default_comparator.py:35: SAWarning: The IN-predicate on "instances.uuid" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate. Consider alternative strategies for improved performance.
return o[0](self, self.expr, op, *(other + o[1:]), **kwargs)

{0} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITestV3.test_killed_worker_recover [2.634592s] ... ok
{0} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITestV3.test_restart_sighup [1.565492s] ... ok
{0} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITestV3.test_terminate_sigterm [2.400319s] ... ok
{1} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_restart_sighup [160.043131s] ... FAILED
{1} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigkill [2.317150s] ... ok
{1} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigterm [2.274788s] ... ok
{1} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITestV3.test_terminate_sigkill [2.089225s] ... ok

and.... hang

So, testr is correctly killing the restart test when it times out. It is also correctly moving on to additional tests. However it is then in a hung state and can't finish once the tests are done.

Why the test timed out, I don't know. However the fact that testr is going crazy is an issue all by itself.

I managed to get a reproduce by creating a slow vm: ubuntu 14.04 in vbox, 1 G ram, 2 vcpu, set to 50% of cpu performance.

tox -epy27 -- --until-fail multiprocess

On the 3rd time through I got the following:

running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \
${PYTHON:-python} -m subunit.run discover -t ./ ./nova/tests  --load-list /tmp/tmp7YX5uh
running=OS_STDOUT_CAPTURE=${OS_STDOUT_CAPTURE:-1} \
OS_STDERR_CAPTURE=${OS_STDERR_CAPTURE:-1} \
OS_TEST_TIMEOUT=${OS_TEST_TIMEOUT:-160} \
${PYTHON:-python} -m subunit.run discover -t ./ ./nova/tests  --load-list /tmp/tmpU5Qsw_
{0} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_killed_worker_recover [5.688803s] ... ok

Captured stderr:
~~~~~~~~~~~~~~~~
    /home/sdague/nova/.tox/py27/local/lib/python2.7/site-packages/sqlalchemy/sql/default_comparator.py:35: SAWarning: The IN-predicate on "instances.uuid" was invoked with an empty sequence. This results in a contradiction, which nonetheless can be expensive to evaluate.  Consider alternative strategies for improved performance.
      return o[0](self, self.expr, op, *(other + o[1:]), **kwargs)
    
{0} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITestV3.test_killed_worker_recover [2.634592s] ... ok
{0} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITestV3.test_restart_sighup [1.565492s] ... ok
{0} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITestV3.test_terminate_sigterm [2.400319s] ... ok
{1} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_restart_sighup [160.043131s] ... FAILED
{1} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigkill [2.317150s] ... ok
{1} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigterm [2.274788s] ... ok
{1} nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITestV3.test_terminate_sigkill [2.089225s] ... ok

and.... hang

So, testr is correctly killing the restart test when it times out. It is also correctly moving on to additional tests. However it is then in a hung state and can't finish once the tests are done.

Why the test timed out, I don't know. However the fact that testr is going crazy is an issue all by itself.

Robert Collins (lifeless) on 2014-09-18

no longer affects:

testrepository

Revision history for this message

Robert Collins (lifeless) wrote on 2014-09-18:

#14

I've deleted the testr task: testr is (correctly) waiting for the test workers to finish (http://paste.openstack.org/show/112990/) - we can open a feature request to have a testr supervisor timeout facility, but its orthogonal to this nova issue.

Sean has grabbed stuff with gdb:
20727 - a worker we think - http://paste.openstack.org/show/112998/ - this appears to be waiting for a new request
20725 - parent of the worker, stuck in the poll hub - http://paste.openstack.org/show/113014/

Looking at the test code there are lots of issues that will make this test fragile:
- it assumes 0 delay between /a/ worker being spawned and all of them being spawned

- I was going to say that we don't set the multiprocessing workers 'daemon' flag - https://docs.python.org/2/library/multiprocessing.html and this will cause a parent process that exits to hang [same as with threads] unless the process object is killed out of band. OTOH if we want to guarantee all in-progress requests finish cleanly, this is appropriate - but we have no timeout protection on each request - but I observe that actually we don't even use multiprocessing - we've grown our own thing :(.

- it uses tearDown which is not guaranteed to run if setUp has issues.

The hanging process is from test_restart_sighup.

This test doesn't seem to test what SIGHUP does - SIGHUP doesn't restart any children, and the test doesn't strace the processes to see if SIGHUP is actually propogated to children, so the call to _wait should be a 100% no-op, *except* that we want to know the processes aren't being killed - thats why we cross check the pids. But the use of _wait means we'll exit on the first check - not allowing any time for the service process to do the wrong thing. That is, it can falsely pass.

The service code or handling signals is racy: it sets sigcaught to the received signal but multiple signals may be delivered before any are processed. I don't have reason to believe that that is whats happening here.

This is a worry:
except fixtures.TimeoutException:

we catch it, but we don't re-raise it. So we could 'pass' after being timed out, if it happens in tearDown. That seems wrong to me.

I've deleted the testr task: testr is (correctly) waiting for the test workers to finish (http://paste.openstack.org/show/112990/) - we can open a feature request to have a testr supervisor timeout facility, but its orthogonal to this nova issue.

Sean has grabbed stuff with gdb:
20727 - a worker we think - http://paste.openstack.org/show/112998/ - this appears to be waiting for a new request
20725 - parent of the worker, stuck in the poll hub - http://paste.openstack.org/show/113014/

Looking at the test code there are lots of issues that will make this test fragile:
 - it assumes 0 delay between /a/ worker being spawned and all of them being spawned

- I was going to say that we don't set the multiprocessing workers 'daemon' flag - https://docs.python.org/2/library/multiprocessing.html and this will cause a parent process that exits to hang [same as with threads] unless the process object is killed out of band. OTOH if we want to guarantee all in-progress requests finish cleanly, this is appropriate - but we have no timeout protection on each request - but I observe that actually we don't even use multiprocessing - we've grown our own thing :(.

- it uses tearDown which is not guaranteed to run if setUp has issues.

The hanging process is from test_restart_sighup.

This test doesn't seem to test what SIGHUP does - SIGHUP doesn't restart any children, and the test doesn't strace the processes to see if SIGHUP is actually propogated to children, so the call to _wait should be a 100% no-op, *except* that we want to know the processes aren't being killed - thats why we cross check the pids. But the use of _wait means we'll exit on the first check - not allowing any time for the service process to do the wrong thing. That is, it can falsely pass.

The service code or handling signals is racy: it sets sigcaught to the received signal but multiple signals may be delivered before any are processed. I don't have reason to believe that that is whats happening here.

This is a worry:
            except fixtures.TimeoutException:

we catch it, but we don't re-raise it. So we could 'pass' after being timed out, if it happens in tearDown. That seems wrong to me.

Revision history for this message

Robert Collins (lifeless) wrote on 2014-09-18:

#15

Looking further, _start_child in service.py is leaky - interrupt that at the wrong time and we won't capture the child pid, which will lead to us not killing it, and that will result in a stuck process tree.

Revision history for this message

Robert Collins (lifeless) wrote on 2014-09-18:

#16

Oh, for clarity; testr is hung here because the pipe it opened to the test process is still open. That means that this leaked worker process has the same stdout as the subunit.run process. Pushing up a simple fix which will stop this timing out in the same way (but obviously won't fix the underlying bug).

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-18: Fix proposed to nova (master)

#17

Fix proposed to branch: master
Review: https://review.openstack.org/122543

Changed in nova:
assignee:	nobody → Robert Collins (lifeless)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-18:

#18

Fix proposed to branch: master
Review: https://review.openstack.org/122545

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-19: Fix merged to nova (master)

#19

Reviewed: https://review.openstack.org/122545
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4acf31fb3ac97fe84efb5833da58fa90c145d947
Submitter: Jenkins
Branch: master

commit 4acf31fb3ac97fe84efb5833da58fa90c145d947
Author: Robert Collins <email address hidden>
Date: Fri Sep 19 10:14:48 2014 +1200

tests: kill worker pids as well on timeouts

Workers that don't exit can also trigger a timeout, and we want our
test environment to be super robust - kill them aggressively too.

Also raise the timeout error so that we don't mask this when it
happens.

Change-Id: Iee606720f98e9a5e8a11f162eb885c0a00a2345e
Partial-Bug: #1357578

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-20:

#20

Reviewed: https://review.openstack.org/122543
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7d4a17f2ad72bf1ffa979f57a59bb2bd2727afe3
Submitter: Jenkins
Branch: master

commit 7d4a17f2ad72bf1ffa979f57a59bb2bd2727afe3
Author: Robert Collins <email address hidden>
Date: Fri Sep 19 10:07:52 2014 +1200

Close standard fds in test child process

    The std fd's are shared with parent processes, which makes testr think
    that we still have test workers, even though this isn't a test
    process, its a live server.

Change-Id: I516a6a9bd9a98abffeb182189038763ea94618f6
Partial-Bug: #1357578

Revision history for this message

Michael Still (mikal) wrote on 2014-09-22:

#21

I don't see any outstanding reviews for this bug. Is this correct or have I missed something?

Revision history for this message

Jay Pipes (jaypipes) wrote on 2014-09-22:

#22

There are no outstanding reviews, no, but there are a still a dozen or more failures in the gate since the last patch from lifeless was merged. So, unfortunately, I don't believe that those two patches have resolved this issue entirely.

Revision history for this message

Robert Collins (lifeless) wrote on 2014-09-22: Re: [Bug 1357578] Re: Unit test: nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigterm timing out in gate

#23

My patches were intended to help the diagnosis of the failure, not
prevent it: the issue is real - we're leaking processes that we fail
to shut down; my patches should have helped us get more logging and
diagnostics out when it happens.

Robert Collins (lifeless) on 2014-09-22

Changed in nova:
assignee:	Robert Collins (lifeless) → nobody
status:	In Progress → Triaged

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-23: Related fix proposed to nova (master)

#24

Related fix proposed to branch: master
Review: https://review.openstack.org/123426

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-23:

#25

Related fix proposed to branch: master
Review: https://review.openstack.org/123427

Revision history for this message

Sean Dague (sdague) wrote on 2014-09-23:

#26

So... my current theory is the following:

    def _get_workers(self):
        # NOTE(hartsocks): use of ps checks the process table for child pid
        # entries these processes may be ended but not reaped so ps may
        # show processes that are still being cleaned out of the table.
        f = os.popen('ps ax -o pid,ppid,command')
        # Skip ps header
        f.readline()

        processes = [tuple(int(p) for p in l.strip().split()[:2])
                     for l in f.readlines()]
        workers = [p for p, pp in processes if pp == self.pid]
        LOG.info('in _get_workers(), workers: %r for PPID %d',
                 workers, self.pid)
        return workers

Collects the subprocesses via ps.

However, when we loop on them, we assume that they were all sent the correct SIGTERM previously. And we os.kill(pid, 0), which doesn't actually send a kill signal.

So, if some earlier test failed on us, and it spawned a worker that we didn't expect, we could very easily stall polling the status of that process. And we'll never even try the rest of them based on the way the for loop works.

Proposed fix is to send a SIGTERM on every loop iteration.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2014-09-23:

#27

This article from Doug Hellmann might be useful:

http://pymotw.com/2/signal/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-23:

#28

Related fix proposed to branch: master
Review: https://review.openstack.org/123467

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-23: Fix proposed to nova (master)

#29

Fix proposed to branch: master
Review: https://review.openstack.org/123521

Changed in nova:
assignee:	nobody → Jay Pipes (jaypipes)
status:	Triaged → In Progress

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-09-23:

#30

Here's one more theory - Lifeless noticed correctly that "_start_child in service.py is leaky" but that really should not matter because child processes should die if the parent dies an unexpected death (from openstack/common/service.py):

This bit happens in the forked worker:

def _child_process(self, service):
self._child_process_handle_signal()

        # Reopen the eventlet hub to make sure we don't share an epoll
        # fd with parent and/or siblings, which would be bad
        eventlet.hubs.use_hub()

        # Close write to ensure only parent has it open
        os.close(self.writepipe)
        # Create greenthread to watch for parent to close pipe
        eventlet.spawn_n(self._pipe_watcher)

and the _pipe_watcher method:

   def _pipe_watcher(self):
        # This will block until the write end is closed when the parent
        # dies unexpectedly
        self.readpipe.read()

LOG.info(_LI('Parent process has died unexpectedly, exiting'))

sys.exit(1)

interesting bit is that it will only raise a SystemExit (not sure why it doesn't do the os._exit there) in a fork of a testr process that might actually be handling it in some way that can cause unexpected behavior?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-24: Related fix merged to nova (master)

#31

Reviewed: https://review.openstack.org/123467
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ff259324abf66f436b5098a6eaf8d781ac57140d
Submitter: Jenkins
Branch: master

commit ff259324abf66f436b5098a6eaf8d781ac57140d
Author: Sean Dague <email address hidden>
Date: Tue Sep 23 11:02:55 2014 -0400

Add further debug logging for multiprocess test

In order to track down where we are stalling add some additional
informational logging into the cleanup path.

Change-Id: Ia12b1049b3a63851bde69cb5f7c3fe31b0d12c41
Related-Bug: #1357578

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-09-24:

#32

Further to comment #30 - actually after looking at it more closely it seems that we do check for SystemExit in _spawn which should account for the _pipe_watching green thread killing a worker, so it's probably not that.

Revision history for this message

Nikola Đipanov (ndipanov) wrote on 2014-09-24:

#33

So one more thing I noticed while looking at this mess. The 'launcher' worker we spawn is forked off of a prcess that ran the whole setUp for the integrated_helpers._IntegratedTestBase that starts a bunch of services in the same process (meaning a bunch of greentrheads). I am not sure how they play with fork(), but usually best practice is to re-initiate the eventlet hub after fork.

I can imagine that we may be timing out because of a deadlock especially in the SIGINT case since the 'launcher' process needs to get to calling os.kill(SIGINT) on it's children to kill them.

How about we attempt to re-initialize eventlet in the 'launcher' process?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-24: Fix proposed to nova (master)

#34

Fix proposed to branch: master
Review: https://review.openstack.org/123667

Changed in nova:
assignee:	Jay Pipes (jaypipes) → Nikola Đipanov (ndipanov)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-24:

#35

Fix proposed to branch: master
Review: https://review.openstack.org/123742

Changed in nova:
assignee:	Nikola Đipanov (ndipanov) → Sean Dague (sdague)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-24: Change abandoned on nova (master)

#36

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/123427

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-24:

#37

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/123426

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-24: Fix merged to nova (master)

#38

Reviewed: https://review.openstack.org/123742
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f3c99ba1efb1c039ec47dfdf29781f2e05f32b81
Submitter: Jenkins
Branch: master

commit f3c99ba1efb1c039ec47dfdf29781f2e05f32b81
Author: Sean Dague <email address hidden>
Date: Wed Sep 24 10:18:48 2014 -0400

remove test_multiprocess_api

This test suite should be removed for the following reasons.

1) it's not a unit test

    2) it's failing randomly, a lot, and depends on timing of the host for
    process start and signal handling which are provably not predicable on
    slower machines

    3) it's triggering testr bugs in the ways that it fails which means we
    don't actually get any subunit in the fail scenario, which makes it
    actually impossible to move forward on fixes.

Change-Id: Ieb7c02115c12cd9f6c9fa67691a643aed7632784
Closes-Bug: #1357578

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-25: Fix proposed to nova (stable/icehouse)

#39

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/123976

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-26: Change abandoned on nova (master)

#40

Change abandoned by Jay Pipes (<email address hidden>) on branch: master
Review: https://review.openstack.org/123521
Reason:
This test has been removed now.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-26:

#41

Change abandoned by Nikola Dipanov (<email address hidden>) on branch: master
Review: https://review.openstack.org/123667
Reason: We decided to rip the code out instead of fixing it! :)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-09-26: Fix merged to nova (stable/icehouse)

#42

Reviewed: https://review.openstack.org/123976
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4e6371b82a68c265c010fb2a2edb7235319d36b7
Submitter: Jenkins
Branch: stable/icehouse

commit 4e6371b82a68c265c010fb2a2edb7235319d36b7
Author: Sean Dague <email address hidden>
Date: Wed Sep 24 10:18:48 2014 -0400

remove test_multiprocess_api

This test suite should be removed for the following reasons.

1) it's not a unit test

    2) it's failing randomly, a lot, and depends on timing of the host for
    process start and signal handling which are provably not predicable on
    slower machines

    3) it's triggering testr bugs in the ways that it fails which means we
    don't actually get any subunit in the fail scenario, which makes it
    actually impossible to move forward on fixes.

Conflicts:
nova/tests/integrated/test_multiprocess_api.py

    Change-Id: Ieb7c02115c12cd9f6c9fa67691a643aed7632784
    Closes-Bug: #1357578
    (cherry picked from commit f3c99ba1efb1c039ec47dfdf29781f2e05f32b81)

tags:

added: in-stable-icehouse

Thierry Carrez (ttx) on 2014-10-01

Changed in nova:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2014-10-16

Changed in nova:
milestone:	juno-rc1 → 2014.2

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Critical	Sean Dague	OpenStack Compute (nova) 2014.2 "juno"
	Icehouse	Fix Released	Critical	Ihar Hrachyshka	OpenStack Compute (nova) 2014.1.3

OpenStack Compute (nova)

Unit test: nova.tests.integrated.test_multiprocess_api.MultiprocessWSGITest.test_terminate_sigterm timing out in gate

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches