Resync OVS, L3, DHCP agents upon revival
Bug #1505166 reported by
Eugene Nikanorov
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
High
|
Eugene Nikanorov |
Bug Description
In some cases on a loaded cloud when neutron is working over rabbitmq in clustered mode there could be a condition when one of the rabbitmq cluster member is stuck replicating queues.
During that period agents that connect via that instance can't communicate and send heartbeats.
Neutron-sever will reschedule resources from such agents in such case. After that, when rabbitmq finishes sync, agents will "revive", but will not do anything to cleanup resources which were rescheduled during their "sleep".
As a result, there could be resources in failed or conflicting state (dhcp/router namespaces, ports with binding_failed).
They should be either deleted or syncronized with server state.
Changed in neutron: | |
importance: | Undecided → Medium |
Changed in neutron: | |
importance: | Medium → High |
Changed in neutron: | |
status: | Fix Committed → Fix Released |
tags: | removed: kilo-backport-potential liberty-backport-potential |
To post a comment you must log in.
Fix proposed to branch: master /review. openstack. org/233557
Review: https:/