rpc_workers does not work with Qpid

Bug #1330199 reported by zhu zhu
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
Medium
Unassigned

Bug Description

After enable rpc_workers other than 0, restart the neutron-server, and found that No consumers will be ever created for q-plugin within Qpid.

It does appear that the all sub processes of neutron-server are getting hanging within the step of self.connection.open() in impl_qpid.py reconnect method.

Tags: qpid rpc
Xu Han Peng (xuhanp)
Changed in neutron:
assignee: nobody → Xu Han Peng (xuhanp)
tags: added: rpc
Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Xu Han Peng (xuhanp) wrote :

By add a breakpoint to the following code, we can find the program hangs at the first breakpoint and never get to the second one.

    def start_rpc_listeners(self):

        self.endpoints = [rpc.RpcCallbacks(self.notifier, self.type_manager),

                          agents_db.AgentExtRpcCallback()]

        self.topic = topics.PLUGIN

        self.conn = n_rpc.create_connection(new=True) <-- first breakpoint

        self.conn.create_consumer(self.topic, self.endpoints,

                                  fanout=False)

        return self.conn.consume_in_threads() <-- second breakpoint

But adding breakpoint to the following places of qpid, we found the program is waiting for the connection being created, but never get the expect OK result.

/usr/lib/python2.7/dist-packages/qpid/messaging/endpoints.py

@synchronized
def attach(self):
"""
Attach to the remote endpoint.
"""
if not self._connected:
self._connected = True
self._driver.start()
self._wakeup()
self._ewait(lambda: self._transport_connected and not self._unlinked())

qpid code waits for self._transport_connected and not self._unlinked() to become true to return.

Revision history for this message
zhu zhu (zhuzhubj) wrote :

It does appears that even having locks for multiple process for the start_rpc_listener wouldn't solve this problem either. Suspect if certain fields within process fork need to altered.

Revision history for this message
Qin Zhao (zhaoqin) wrote :

It seems that the qpid client inherited from parent process does not work in child process.

I reset qpid.selector.Selector.DEFAULT to None at the beginning of RpcWorker.start(), then the child process can connect to qpid server. However, it may not be a very good patch.

Revision history for this message
Qin Zhao (zhaoqin) wrote :

Qpid has a new patch for this issue. This patch is almost same with my workaround...

https://issues.apache.org/jira/browse/QPID-5637

I do not think it is a very good patch. It does not prevent qpid connection of parent process to be leaked into child process, and it also does not cleanup the nested qpid selector thread.

Revision history for this message
Qin Zhao (zhaoqin) wrote :

I do not think Qpid can have a perfect patch for this issue in very short term. It may require lots of code changes to completely fix it. Suggest Neutron to utilize the current patch, and tolerate its side effects.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Added oslo.messaging since the discussion in the qpid jira issue discusses the need to run qpid in oslo.messaging in single-threaded mode as a long-term fix:

https://issues.apache.org/jira/browse/QPID-5637

Ben Nemec (bnemec)
Changed in oslo.messaging:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Ken Giusti (kgiusti) wrote :

I've hit the same issue while using the new AMQP 1.0 driver. Like the legacy qpid::messaging driver (impl_qpid), the new driver uses a background threading.Thread. This thread runs the main I/O loop - blocking on a select() call on the socket connected to the broker.

I've found that the neutron-service service appears to fork() after setting up a Listener. When the Listener is created, the driver opens a socket to the broker. After the fork, both parent and child get a copy of that socket - remember, the background thread gets no indication that a fork() was called - and Things Break.

FWIW, I've attached a log output showing the call stack when the first call is made to create the Listener, and - from what I understand - how the workers are spawned after.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.messaging (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/137651

Mehdi Abaakouk (sileht)
tags: added: qpid
removed: rpc
tags: added: rpc
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/137651
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=eb21f6b2634b976fe65837c52741a6c0c3ef1fe5
Submitter: Jenkins
Branch: master

commit eb21f6b2634b976fe65837c52741a6c0c3ef1fe5
Author: Mehdi Abaakouk <email address hidden>
Date: Thu Nov 27 14:46:52 2014 +0100

    Warn user if needed when the process is forked

    This change warns the library consumer when the process if forked and
    we can't be sure that the library work as expected.

    This also add some documentation about forking oslo.messaging Transport
    object.

    Change-Id: I2938421775aa72866adac198d70214856d45e165
    Related-bug: #1330199

Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

I think the fix mentioned in comment #10 addresses this for oslo.messaging.

Changed in oslo.messaging:
status: Confirmed → Fix Committed
Changed in oslo.messaging:
milestone: none → 1.6.0
status: Fix Committed → Fix Released
Ryan Moats (rmoats)
no longer affects: neutron
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers