[RFE] neutron cells aware

Bug #1690425 reported by Armando Migliaccio
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
neutron
Opinion
Wishlist
Unassigned

Bug Description

Cells is an option to scale OpenStack deployments. It would be nice to make Neutron able to work with the individual cell DB/MQ clusters rather than relying (as it does it to this day) on a global DB/MQ cluster. This will help with scaling traffic involving messages as well as fault tolerance.

[1] https://etherpad.openstack.org/p/pike-neutron-multi-site

Changed in neutron:
status: New → Confirmed
importance: Undecided → Wishlist
tags: added: rfe
Revision history for this message
Tim Bell (tim-bell) wrote :

Is it necessary that the nova cells and neutron cells are the same scope? I was wondering about an ML2 based neutron cell whereas nova cells tend to be grouped more on the hardware types (although not guaranteed)

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

This specific proposal is not suggesting to introduce a first-class cell concept in Neutron, but rather make sure that Neutron be aware of the nova cells where the compute workload affected by networking events actually run. This has the benefit of spreading control messages over the DB/MQ clusters rather than overloading a single 'central' MQ/DB as it happens today. For this reason, I suspect that there is no need for a neutron cell abstraction per se (and thus no concern about mapping of scopes), at least in the first iteration of this effort, but I think we should be at least open to the idea in case we're missing other important use cases.

Revision history for this message
Chaoyi Huang (joehuang) wrote :

Neutron + Tricircle can work with cells v2 multi-cell to address the MQ/DB challenge: http://lists.openstack.org/pipermail/openstack-dev/2017-May/117599.html . No need to do 1:1 mapping, though it can do like this.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Let's agree on next steps here.

Changed in neutron:
status: Confirmed → Triaged
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

For scaling reasons, approach as shown in [1] looks really cool.

[1] https://www.youtube.com/watch?v=R0fwHr8XC1I

Revision history for this message
Miguel Lavalle (minsel) wrote :

Update after meeting on 10/12/2017. Decided to take 3 actions:

1) Dig deeper in the messaging approach
2) Fact finding about Tricircle
3) Add a section to Networking Guide plotting the possible approaches

Revision history for this message
Miguel Lavalle (minsel) wrote :
Revision history for this message
Miguel Lavalle (minsel) wrote :

As a reference, this is a high level description of cells V2: https://docs.openstack.org/nova/pike/user/cellsv2_layout.html

Revision history for this message
Miguel Lavalle (minsel) wrote :

Tricircle is not in production yet. OpenStack Cascading, its predecessor, is in production in 5 public clouds

Revision history for this message
Miguel Lavalle (minsel) wrote :

Members of the team will attend the "Cells V2 update and direction" forum session in Sydney: http://forumtopics.openstack.org/cfp/details/60. In that session we will also seek feedback from operators migrating to Cells V2 as to what bottlenecks they are finding in Neutron

Revision history for this message
Miguel Lavalle (minsel) wrote :

I had a conversation with Kris Lindgren of GoDaddy about this topic during the flight back from Sydney. The summary of it is that he would see a lot of benefit if we were able to improve the performance of the RPC bus by using message routing like suggested in comment #5.He also said, though, they would see a lot of benefit in moving to a configuration where cells would be reflected in Neutron for failure domain isolation purposes

Revision history for this message
Miguel Lavalle (minsel) wrote :

It was agreed in the latest drivers meeting that the pertinent next step is to develop a prototype following the approach suggested in #5 above. The aim is to assess the feasibility of the approach and help uncover unforeseen problems

Miguel Lavalle (minsel)
tags: added: rfe-triaged
Revision history for this message
Miguel Lavalle (minsel) wrote :

Presentation pointed to in note #5 is about Apache Qpid dispatch router. It's page is here: http://qpid.apache.org/components/dispatch-router/index.html

Revision history for this message
Miguel Lavalle (minsel) wrote :

Performance measurement of Qpid dispatch router and rabbitmq in the context of OpenStack: https://www.youtube.com/watch?v=xGTW3FvJYI4. This is the presentation: https://www.openstack.org/assets/presentation-media/Vancouver-summit-Openstack-internal-messaging-in-depth-evaluation2.pdf

Revision history for this message
Miguel Lavalle (minsel) wrote :
Miguel Lavalle (minsel)
tags: added: rfe-confirmed
removed: rfe-triaged
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

Is there still any interest in working on this? Anyone who wants to continue that?
Maybe it's not needed if we will move to ML2/OVN, where rabbitmq isn't used at all?

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

As was discussed on the PTG, we decided to close this RFE for now. Feel free to reopen it if there will be valid use case for that and if You will want to work on that. Then we can discuss it again in the drivers team meeting.

Changed in neutron:
status: Triaged → Opinion
tags: added: rfe-postponed
removed: rfe rfe-confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.