neutron

[RFE] Push all object information in AMQP notifications

Bug #1516195 reported by Kevin Benton on 2015-11-14

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	neutron	Won't Fix	Wishlist	Kevin Benton	neutron pike-2 "p-2"

Bug Description

The agent/server communication pattern we use now can lead to cascading failures making the servers unavailable.

The current pattern in our communications between the Neutron server and the agents looks like the following:

Server sends: item <item-uuid> changed
Client receives event.
Client makes a call to the server asking for the item details.

The calls the client makes to the server can be expensive and a server under heavy load can take a long time to start processing the request and/or to fulfill the request. This can trigger a timeout on the agent side, which leads to a retry, or, even worse, a generic fallback to resync the entire state. This creates a thundering herd problem where a server falling behind on requests will be continually stampeded by retries from agents that have timed out by the time the server can respond.

The pattern of agent/server communication needs to be adjusted to assume terrible server response times at a minimum. Optimally, all of the notifications generated by the servers should be adjusted to include all of the relevant information that an agent will need to respond to an event so the only time an agent has to actually call the server is on startup to get initial state.

Tags:

Revision history for this message

Miguel Angel Ajo (mangelajo) wrote on 2015-11-17:

Yes, we need a good RPC mechanism to avoid the need to sync-back on neutron server.

And probably, implementing step back / circuit breaking patterns on requests back to neutron to mitigate cascading failures.

If we used a single source of incremental IDs we could rely on (redis INCR [1]? or any abstracted client..), we could tag RPC messages with incremental IDs to avoid the out-of-order issues. Or, we could timestamp resources on DB to make sure we always keep the latest update of an object (but that gets complicated on composited objects: like when we add "qos_policy_id" by extending a port)..

Changed in neutron:
importance:	Undecided → Medium

Gary Kotton (garyk) on 2015-11-18

tags:	added: loadimpact
Changed in neutron:
status:	New → Triaged

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2015-12-01:

https://review.openstack.org/#/c/225995/

tags:	added: rfe
Changed in neutron:
importance:	Medium → Wishlist

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2015-12-16:

Based on [1] and spec traffic it seems implicit that super Kevin is coming to the rescue.

tags:

added: rfe-approved
removed: rfe

Armando Migliaccio (armando-migliaccio) on 2015-12-16

Changed in neutron:
assignee:	nobody → Kevin Benton (kevinbenton)
milestone:	none → mitaka-2

Armando Migliaccio (armando-migliaccio) on 2016-01-20

Changed in neutron:
milestone:	mitaka-2 → mitaka-3

Henry Gessau (gessau) on 2016-01-25

summary:

- Push all object information in AMQP notifications
+ [RFE] Push all object information in AMQP notifications

Armando Migliaccio (armando-migliaccio) on 2016-03-03

Changed in neutron:
milestone:	mitaka-3 → mitaka-rc1

Revision history for this message

Armando Migliaccio (armando-migliaccio) wrote on 2016-03-11:

Any partial fix will have to go is as 'Partial-bug' or 'Related-bug'

Changed in neutron:
milestone:	mitaka-rc1 → newton-1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-05-24: Related fix merged to neutron-specs (master)

Reviewed: https://review.openstack.org/225995
Committed: https://git.openstack.org/cgit/openstack/neutron-specs/commit/?id=875253ececf560090fa231abd93e9f0fdee21ee3
Submitter: Jenkins
Branch: master

commit 875253ececf560090fa231abd93e9f0fdee21ee3
Author: Kevin Benton <email address hidden>
Date: Thu Sep 17 05:57:07 2015 -0700

Add spec for push notification refactor

    Adds a spec to change the method we use to get
    information from the server to the agents. Rather
    than the server notifying the agent to call the server,
    we can just put the relevant data in the notification
    itself to improve scalability and reliability.

    The bulk of this spec is dealing with the message ordering
    guarantee we will need to accomplish this. It also has
    some work items to help improve our current pattern.

Related-Bug: #1516195
Change-Id: I3af200ad84483e6e1fe619d516ff20bc87041f7c

Armando Migliaccio (armando-migliaccio) on 2016-06-03

Changed in neutron:
milestone:	newton-1 → newton-2

Armando Migliaccio (armando-migliaccio) on 2016-07-15

Changed in neutron:
milestone:	newton-2 → newton-3

Armando Migliaccio (armando-migliaccio) on 2016-09-01

Changed in neutron:
milestone:	newton-3 → newton-rc1

Armando Migliaccio (armando-migliaccio) on 2016-09-09

Changed in neutron:
milestone:	newton-rc1 → ocata-1

Armando Migliaccio (armando-migliaccio) on 2016-11-16

Changed in neutron:
milestone:	ocata-1 → ocata-2

Armando Migliaccio (armando-migliaccio) on 2017-01-06

Changed in neutron:
milestone:	ocata-2 → ocata-3

Armando Migliaccio (armando-migliaccio) on 2017-01-26

Changed in neutron:
milestone:	ocata-3 → ocata-rc1

Ihar Hrachyshka (ihar-hrachyshka) on 2017-02-06

Changed in neutron:
milestone:	ocata-rc1 → ongoing

Ihar Hrachyshka (ihar-hrachyshka) on 2017-03-09

Changed in neutron:
milestone:	ongoing → pike-1

Armando Migliaccio (armando-migliaccio) on 2017-05-18

Changed in neutron:
milestone:	pike-1 → pike-2

Revision history for this message

Brian Haley (brian-haley) wrote on 2023-01-11:

I'm going to mark this as Won't Fix since after 7+ years it's still not done. And with OVN a lot of this is moot. Someone can resurrect if need be.

Changed in neutron:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1438159

You are

Subscribing...

Edit bug mail

Other bug subscribers

Related blueprints

Use push style notifications for all server->agent information

Remote bug watches

Bug watches keep track of this bug in other bug trackers.