zmq: Lack of outbound connection re-use limits scalability with neutron

Bug #1384113 reported by James Page
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
Undecided
James Page

Bug Description

During testing of the zmq driver with OpenStack Juno on a 300 compute node cloud, instance creation often failed due to nova being unable to access the API of neutron during setup. During setup, neutron sends a number of cast/fanout messages to neutron-openvswitch-agent's on compute nodes; as the ZMQ driver does no pooling on outbound connections, every message goes through the overhead of setup of a tcp connection; multiply this by the number of edges (300 in this case) and this soon bottlenecks, causing the neutron workers to backup.

The ZMQ driver needs a bit of a re-design to support connection pooling; this might be done by turning the zmq-receiver into a more general broker for both inbound and outbound messaging, allowing a single point for pooling an re-use as required, and offloading massive fanouts from the openstack daemon making the request.

Changed in oslo.messaging:
status: New → Confirmed
Li Ma (nick-ma-z)
Changed in oslo.messaging:
assignee: nobody → Li Ma (nick-ma-z)
Revision history for this message
James Page (james-page) wrote :

Nick

I have the start of a patch to support connection sharing and re-use for outbound messaging, as well as better use of zmq contexts for message batching etc...

Its testing OK now - so I'll put it up for review sometime next week.

Revision history for this message
Li Ma (nick-ma-z) wrote :

OK. I'm working on this spec:
https://blueprints.launchpad.net/oslo.messaging/+spec/zmq-socket-reuse

I'm refactoring my code and designing the unit tests. I'm also writing the spec, but I haven't done it yet.

If you have codes available for review, I'm glad to assign this bp to you. As it is more than a bug fix, I suggest you add a spec for review before you submit your review.

What do you think?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/176883

Changed in oslo.messaging:
assignee: Li Ma (nick-ma-z) → James Page (james-page)
status: Confirmed → In Progress
Revision history for this message
James Page (james-page) wrote :

Nick

I've pushed my wip up for general review.

I'll also take a run through your spec to compare with my approach.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/176883
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=de015d5c8308bf5b9cc723c398ae4b91ec814347
Submitter: Jenkins
Branch: master

commit de015d5c8308bf5b9cc723c398ae4b91ec814347
Author: James Page <email address hidden>
Date: Thu Apr 23 17:08:24 2015 +0100

    zmq: Add support for ZmqClient pooling

    To avoid creating a new ZMQ connection for every message sent
    to a remote broker, implement pooling and re-use of ZmqClient
    objects and associated ZMQ context.

    A pool is created for each remote endpoint (keyed by address);
    the size of each pool is configured using rpc_conn_pool_size.

    All outbound message client connections are pooled.

    Closes-Bug: 1384113
    Change-Id: Ia55d5c310a56e51df5e2f5d39e561a4da3fe4d83

Changed in oslo.messaging:
status: In Progress → Fix Committed
Changed in oslo.messaging:
milestone: none → 1.11.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.