Stopping neutron agent containers also brings down dataplane services
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Brent Eagles |
Bug Description
(Do not confuse with https:/
tl;dr: services like metadata, dhcp, routing, etc. are significantly impacted when related neutron agents in a containerized deployment. This is a regression with respect to baremetal deployments.
From mailing list:
"The neutron agents are implemented in such a way that key functionality is implemented in terms of haproxy, dnsmasq, keepalived and radvd configuration. The agents manage instances of these services but, by design, the parent is the top-most (pid 1).
On baremetal this has the advantage that, while control plane changes cannot be made while the agents are not available, the configuration at the time the agents were stopped will work (for example, VMs that are restarted can request their IPs, etc). In short, the dataplane is not affected by shutting down the agents.
In the TripleO containerized version of these agents, the supporting processes (haproxy, dnsmasq, etc.) are run within the agent's container so when the container is stopped, the supporting processes are also stopped. That is, the behavior with the current containers is significantly different than on baremetal and stopping/restarting containers effectively breaks the dataplane. At the moment this is being considered a blocker and unless we can find a resolution, we may need to recommend running the L3, DHCP and metadata agents on baremetal."
This problem is exacerbated by the fact that neutron is not container-aware and does not currently support launching containers for these processes nor can directly monitor them.
Changed in tripleo: | |
importance: | Critical → High |
Changed in tripleo: | |
milestone: | queens-rc1 → rocky-1 |
Changed in tripleo: | |
status: | Triaged → In Progress |
tags: | added: queens-backport-potential |
Changed in tripleo: | |
milestone: | rocky-1 → rocky-2 |
Changed in tripleo: | |
status: | In Progress → Fix Released |
Marking as "critical" for now. While it does not affect CI it has project-wide implications with respect to planning, etc.