zmq: Lack of outbound connection re-use limits scalability with neutron

Bug #1384113 reported by James Page on 2014-10-22
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
oslo.messaging
Undecided
James Page

Bug Description

During testing of the zmq driver with OpenStack Juno on a 300 compute node cloud, instance creation often failed due to nova being unable to access the API of neutron during setup. During setup, neutron sends a number of cast/fanout messages to neutron-openvswitch-agent's on compute nodes; as the ZMQ driver does no pooling on outbound connections, every message goes through the overhead of setup of a tcp connection; multiply this by the number of edges (300 in this case) and this soon bottlenecks, causing the neutron workers to backup.

The ZMQ driver needs a bit of a re-design to support connection pooling; this might be done by turning the zmq-receiver into a more general broker for both inbound and outbound messaging, allowing a single point for pooling an re-use as required, and offloading massive fanouts from the openstack daemon making the request.

Changed in oslo.messaging:
status: New → Confirmed
Li Ma (nick-ma-z) on 2015-03-20
Changed in oslo.messaging:
assignee: nobody → Li Ma (nick-ma-z)
James Page (james-page) wrote :

Nick

I have the start of a patch to support connection sharing and re-use for outbound messaging, as well as better use of zmq contexts for message batching etc...

Its testing OK now - so I'll put it up for review sometime next week.

Li Ma (nick-ma-z) wrote :

OK. I'm working on this spec:
https://blueprints.launchpad.net/oslo.messaging/+spec/zmq-socket-reuse

I'm refactoring my code and designing the unit tests. I'm also writing the spec, but I haven't done it yet.

If you have codes available for review, I'm glad to assign this bp to you. As it is more than a bug fix, I suggest you add a spec for review before you submit your review.

What do you think?

Fix proposed to branch: master
Review: https://review.openstack.org/176883

Changed in oslo.messaging:
assignee: Li Ma (nick-ma-z) → James Page (james-page)
status: Confirmed → In Progress
James Page (james-page) wrote :

Nick

I've pushed my wip up for general review.

I'll also take a run through your spec to compare with my approach.

Reviewed: https://review.openstack.org/176883
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=de015d5c8308bf5b9cc723c398ae4b91ec814347
Submitter: Jenkins
Branch: master

commit de015d5c8308bf5b9cc723c398ae4b91ec814347
Author: James Page <email address hidden>
Date: Thu Apr 23 17:08:24 2015 +0100

    zmq: Add support for ZmqClient pooling

    To avoid creating a new ZMQ connection for every message sent
    to a remote broker, implement pooling and re-use of ZmqClient
    objects and associated ZMQ context.

    A pool is created for each remote endpoint (keyed by address);
    the size of each pool is configured using rpc_conn_pool_size.

    All outbound message client connections are pooled.

    Closes-Bug: 1384113
    Change-Id: Ia55d5c310a56e51df5e2f5d39e561a4da3fe4d83

Changed in oslo.messaging:
status: In Progress → Fix Committed
Changed in oslo.messaging:
milestone: none → 1.11.0
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers