L2-population fanout-cast leads to performance and scalability issue

Bug #1337717 reported by Chaoyi Huang
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Expired
Undecided
Unassigned

Bug Description

https://github.com/osrg/quantum/blob/master/neutron/plugins/ml2/drivers/l2pop/rpc.py

def _notification_fanout(self, context, method, fdb_entries):
        ....
        self.fanout_cast(context,
                         self.make_msg(method, fdb_entries=fdb_entries),
                         topic=self.topic_l2pop_update)

the fanout_cast will publish the message to all L2 agents listening "l2population" topic.

If there are 1000 agents (it is a small cloud), and all of them are listening to "l2population" topic, adding one new port will leads to 1000 sub messages. Generally rabbitMQ can handle 10k messages per second, and the fanout_cast method will leads to greatly performance issues, and make the neutron service hard to scale, the concurrency of VM port request will be very very small.

No matter how many ports in the subnet, the performance is up to the number of the L2 agents listening the topic.

The way to solve the performance and scalability issue is to make the L2 agent listening a topic related to network, for example, using network uuid as the topic. If one port is activated in the subnet, only those agents where there are VMs of the same network should receive the L2-pop message. This is parial-mesh, the original design purpose, but not implemented yet.

tags: added: loadimpact
removed: l2
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

The problem described in the bug seems to be a new feature needed to increase performance on the scale.
It can't be really considered as a bug because the described behavior is as designed.

I suggest to work on this problem in the scope of appropriate blueprint.

Changed in neutron:
importance: Undecided → Medium
status: New → Opinion
Revision history for this message
Steve Ruan (ruansx) wrote :

Hi Eugene,

Our team had a bgp evpn internal exchange project, which the right solution for this bug.
Could you please assign this bug to me?

Thanks.

Steve Ruan (ruansx)
Changed in neutron:
assignee: nobody → steve (ruansx)
assignee: steve (ruansx) → nobody
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

We'd need to come up with a backward compatible strategy to handle the change in topic subscription. However this isn't just a problem for l2pop.

Changed in neutron:
status: Opinion → Confirmed
importance: Medium → Low
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (INCUBATOR-JUNO, LIBERTY, MITAKA, NEWTON).
  Valid example: CONFIRMED FOR: INCUBATOR-JUNO

Changed in neutron:
importance: Low → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.