[RFE] Neutron creates too many connections to rabbitmq (NOTE: change this name in accordance with the proposal)

Bug #2007674 reported by Anton V
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
New
Wishlist
Unassigned

Bug Description

It is undesirable to keep many TCP connections open at the same time because doing so consumes system resources and makes it more difficult to scale.

E.g OVS agent creates 15 long-lived connections to rabbitmq from scratch https://paste.opendev.org/show/br2JxQ7SJkX1Ib2Za3jv/

Comparing to Nova which uses 3 connections https://paste.opendev.org/show/bTeFYCTEVX4Hx3YHwJi2/

Agent creates separate topic/RPC server for each resource, like neutron-vo-Network-1.1, neutron-vo-SecurityGroup-1.2, neutron-vo-Port-1.5 and so on.

We could combine these calls into couple RPC servers and reduce connection number, oslo.messaging supports connection pool, which could be configured via config, so it manages connections dynamically.

Openstack Yoga, DVR mode

Revision history for this message
yatin (yatinkarel) wrote :

Thanks Anton for the bug report, Can you also share below to better Triage it:-
- Deployment method used, and share neutron/nova config
- How much resource consumption you noticed with these connections
- And what issue you notice at scale(also what's the scale?)

<< E.g OVS agent creates 15 long-lived connections to rabbitmq from scratch https://paste.opendev.org/show/br2JxQ7SJkX1Ib2Za3jv/
Just correction, those connections count 14. The same i can see from some CI logs.

Changed in neutron:
status: New → Incomplete
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Anton:

We think [1] this could be an interesting improvement and this is why I'll mark it as RFE.

Whoever is taking this bug, should check first if that is possible using the RPC mechanism implemented in Neutron. Then, he/she should propose a spec and present it in the Neutron drivers meeting.

Regards.

[1]https://meetings.opendev.org/meetings/networking/2023/networking.2023-02-21-14.00.log.html#l-86

Changed in neutron:
status: Incomplete → New
importance: Undecided → Critical
importance: Critical → Wishlist
summary: - Neutron creates too many connections to rabbitmq
+ [RFE] Neutron creates too many connections to rabbitmq (NOTE: change
+ this name in accordance with the proposal)
tags: added: rfe
Revision history for this message
Julien Cosmao (julien-cosmao) wrote (last edit ):

Hello,

Same observation here, i started looking at it after recurrent issues on neutron RabbitMQ cluster of our largest regions (~2000 nodes) with ovs deployment, dvr and metadata. Each node have a lot of connection to broker [1]

Issues mostly appears when agents need to be restarted or when an issue hit a node of rabbit cluster (e.g. cluster partition) and agent reconnect.

I would also note that number of queues created by neutron agents [2] are way too high and for most of them, are not even used. During a network partition for ex, rabbit cluster will need to reelect a leader for each queues owned by failed node, this process is also an issue at scale.

I started working on those topic for infra scaling need and reduce stress on rabbitmq cluster because we got too many outage related to neutron.

  [1] reduce nb of connections
like Anton says, agent create separate topic/RPC server for each resources tracked (resource cache).
In oslo.messaging, 1 rpc server = 1 topic = 1 connection (pooling is only used for publishing)
for resource cache, Neutron is created "same" rpc server multiple time, for each resource.

here, i see 2 solutions for reducing nb connections:
- be able to associate 1 RPCserver to multiple topic in oslo.messaging, this way, only 1 connection can be used to consume from multiple queue. This change could be proposed to oslo.messagign project.
- reduce number of differentent topic / declare only 1 topic for common purpose on neutron side (resource cache, q-agent-notifier), but this will require more changes in how neutron implement RPC.
For resource cache example, we would go from 7 connections to only 1.

  [2] reduce nb of queues
When 1 RPC server is declared, oslo.messaging create 1 connection, create 3 queues and start listening on them:
- topic_fanout
- topic
- topic.host

In most case only 1 of those queues is used by Neutron (e.g. resource cache use only fanout, so queues neutron-vo-RESSOURCE.hostxxx are not used). When rpc server is declared, an oslo.messaging Target describing topic is passed with fanout=bool information. We could use that on oslo messaging side to declare only needed queues on backend and then avoid having all agents declaring extra queues.

With few change in oslo.messaging, number of connection and queues can be reduced easily.

What do you think ?

tags: added: rfe-approved
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.