Oslo - a Library of Common OpenStack Code

Latest eventlet causes AQMP ECONNREFUSED in tests

Reported by Davanum Srinivas (DIMS) on 2012-12-27
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo
Critical
Davanum Srinivas (DIMS)
Grizzly
Critical
Davanum Srinivas (DIMS)

Bug Description

2012-12-27 10:41:28 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:28 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:28 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 5 seconds.
2012-12-27 10:41:28 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 5 seconds.
2012-12-27 10:41:33 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:33 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:33 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 7 seconds.
2012-12-27 10:41:33 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 7 seconds.
2012-12-27 10:41:40 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:40 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:40 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 9 seconds.
2012-12-27 10:41:40 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 9 seconds.
2012-12-27 10:41:49 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:49 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:41:49 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 11 seconds.
2012-12-27 10:41:49 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 11 seconds.
2012-12-27 10:42:00 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:00 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:00 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 13 seconds.
2012-12-27 10:42:00 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 13 seconds.
2012-12-27 10:42:13 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:13 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:13 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 15 seconds.
2012-12-27 10:42:13 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 15 seconds.
2012-12-27 10:42:28 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:28 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:28 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 17 seconds.
2012-12-27 10:42:28 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 17 seconds.
2012-12-27 10:42:45 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:45 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:42:45 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 19 seconds.
2012-12-27 10:42:45 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 19 seconds.
2012-12-27 10:43:04 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:04 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:04 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 21 seconds.
2012-12-27 10:43:04 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 21 seconds.
2012-12-27 10:43:25 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:25 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:25 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 23 seconds.
2012-12-27 10:43:25 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 23 seconds.
2012-12-27 10:43:48 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:48 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:43:48 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 25 seconds.
2012-12-27 10:43:48 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 25 seconds.
2012-12-27 10:44:13 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:44:13 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:44:13 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 27 seconds.
2012-12-27 10:44:13 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 27 seconds.
2012-12-27 10:44:40 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:44:40 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:44:40 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 29 seconds.
2012-12-27 10:44:40 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 29 seconds.
2012-12-27 10:45:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:45:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:45:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:45:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:45:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:45:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:45:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:45:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:46:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:46:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:46:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:46:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:46:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:46:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:46:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:46:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:47:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:47:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:47:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:47:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:47:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:47:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:47:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:47:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:48:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:48:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:48:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:48:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:48:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:48:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:48:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:48:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:49:09 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:49:09 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:49:09 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:49:09 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:49:39 12387 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:49:39 12427 INFO openstack.common.rpc.common [-] Reconnecting to AMQP server on localhost:5672
2012-12-27 10:49:39 12387 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.
2012-12-27 10:49:39 12427 ERROR openstack.common.rpc.common [-] AMQP server on localhost:5672 is unreachable: Socket closed. Trying again in 30 seconds.

There's a temporary work around in progress - https://review.openstack.org/#/c/18678/

So, the waitpid is invoked from openstack/common/service.py

    def _wait_child(self):
        try:
            pid, status = os.wait()
        except OSError as exc:
            if exc.errno not in (errno.EINTR, errno.ECHILD):
                raise
            return None

Now the question is what do we need to change it to? assuming the patch from vishy to eventlet is correct.

Fix proposed to branch: master
Review: https://review.openstack.org/18689

Changed in oslo:
assignee: nobody → Davanum Srinivas (dims-v)
status: New → In Progress
Thierry Carrez (ttx) on 2013-01-02
Changed in oslo:
importance: Undecided → Critical
Mark McLoughlin (markmc) on 2013-01-02
Changed in oslo:
milestone: none → grizzly-2

Reviewed: https://review.openstack.org/18689
Committed: http://github.com/openstack/oslo-incubator/commit/ffeb0855085617095f19296770a1223cb5641d1c
Submitter: Jenkins
Branch: master

commit ffeb0855085617095f19296770a1223cb5641d1c
Author: Davanum Srinivas <email address hidden>
Date: Thu Dec 27 12:22:56 2012 -0500

    Don't rely on os.wait() blocking

    Before this fix which was included in eventlet 0.10.0:

    https://bitbucket.org/which_linden/eventlet/issue/92/eventletgreen-override-of-oswaitpid

    the os.wait() wrapper in eventlet would not block if no child processes
    had exited. This was clearly buggy behaviour and os.wait() no blocks
    correctly. However, it appears it now does not return if the parent process
    was interrupted by a signal. This means that you can't kill the parent process
    and have the child processes cleaned up gracefully.

    To avoid all these issues, switch to non-blocking wait behaviour. We use
    WNOHANG with waitpid() to poll for an exited child process and, if there
    is none, we yield to other green threads and sleep for a short period.
    This means excessive CPU usage, though, which will be tracked by
    bug #1095346.

    Check pid to avoid some unnecessary warning/info messages in the logs

    Fixes LP #1094076

    Change-Id: I783fac652376c2daa9d591403b9f4c1fe9523043

Changed in oslo:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2013-01-09
Changed in oslo:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers