Latest eventlet causes AQMP ECONNREFUSED in tests

Bug #1094076 reported by Davanum Srinivas (DIMS)
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo-incubator
Fix Released
Critical
Davanum Srinivas (DIMS)
Grizzly
Fix Released
Critical
Davanum Srinivas (DIMS)

Bug Description

2012-12-27 10:41:28 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:28 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:28 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 5 seconds.
2012-12-27 10:41:28 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 5 seconds.
2012-12-27 10:41:33 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:33 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:33 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 7 seconds.
2012-12-27 10:41:33 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 7 seconds.
2012-12-27 10:41:40 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:40 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:40 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 9 seconds.
2012-12-27 10:41:40 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 9 seconds.
2012-12-27 10:41:49 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:49 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:49 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 11 seconds.
2012-12-27 10:41:49 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 11 seconds.
2012-12-27 10:42:00 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:00 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:00 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 13 seconds.
2012-12-27 10:42:00 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 13 seconds.
2012-12-27 10:42:13 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:13 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:13 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 15 seconds.
2012-12-27 10:42:13 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 15 seconds.
2012-12-27 10:42:28 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:28 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:28 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 17 seconds.
2012-12-27 10:42:28 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 17 seconds.
2012-12-27 10:42:45 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:45 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:45 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 19 seconds.
2012-12-27 10:42:45 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 19 seconds.
2012-12-27 10:43:04 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:04 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:04 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 21 seconds.
2012-12-27 10:43:04 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 21 seconds.
2012-12-27 10:43:25 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:25 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:25 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 23 seconds.
2012-12-27 10:43:25 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 23 seconds.
2012-12-27 10:43:48 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:48 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:48 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 25 seconds.
2012-12-27 10:43:48 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 25 seconds.
2012-12-27 10:44:13 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:44:13 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:44:13 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 27 seconds.
2012-12-27 10:44:13 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 27 seconds.
2012-12-27 10:44:40 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:44:40 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:44:40 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 29 seconds.
2012-12-27 10:44:40 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 29 seconds.
2012-12-27 10:45:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:45:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:45:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:45:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:45:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:45:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:45:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:45:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:46:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:46:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:46:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:46:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:46:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:46:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:46:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:46:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:47:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:47:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:47:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:47:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:47:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:47:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:47:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:47:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:48:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:48:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:48:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:48:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:48:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:48:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:48:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:48:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:49:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:49:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:49:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:49:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:49:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:49:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:49:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:49:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

There's a temporary work around in progress - https://review.openstack.org/#/c/18678/

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

So, the waitpid is invoked from openstack/common/service.py

    def _wait_child(self):
        try:
            pid, status = os.wait()
        except OSError as exc:
            if exc.errno not in (errno.EINTR, errno.ECHILD):
                raise
            return None

Now the question is what do we need to change it to? assuming the patch from vishy to eventlet is correct.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (master)

Fix proposed to branch: master
Review: https://review.openstack.org/18689

Changed in oslo:
assignee: nobody → Davanum Srinivas (dims-v)
status: New → In Progress
Thierry Carrez (ttx)
Changed in oslo:
importance: Undecided → Critical
Mark McLoughlin (markmc)
Changed in oslo:
milestone: none → grizzly-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo-incubator (master)

Reviewed: https://review.openstack.org/18689
Committed: http://github.com/openstack/oslo-incubator/commit/ffeb0855085617095f19296770a1223cb5641d1c
Submitter: Jenkins
Branch: master

commit ffeb0855085617095f19296770a1223cb5641d1c
Author: Davanum Srinivas <email address hidden>
Date: Thu Dec 27 12:22:56 2012 -0500

    Don't rely on os.wait() blocking

    Before this fix which was included in eventlet 0.10.0:

    https://bitbucket.org/which_linden/eventlet/issue/92/eventletgreen-override-of-oswaitpid

    the os.wait() wrapper in eventlet would not block if no child processes
    had exited. This was clearly buggy behaviour and os.wait() no blocks
    correctly. However, it appears it now does not return if the parent process
    was interrupted by a signal. This means that you can't kill the parent process
    and have the child processes cleaned up gracefully.

    To avoid all these issues, switch to non-blocking wait behaviour. We use
    WNOHANG with waitpid() to poll for an exited child process and, if there
    is none, we yield to other green threads and sleep for a short period.
    This means excessive CPU usage, though, which will be tracked by
    bug #1095346.

    Check pid to avoid some unnecessary warning/info messages in the logs

    Fixes LP #1094076

    Change-Id: I783fac652376c2daa9d591403b9f4c1fe9523043

Changed in oslo:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in oslo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.