Stopping neutron agent containers also brings down dataplane services

Bug #1749209 reported by Brent Eagles
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Brent Eagles

Bug Description

(Do not confuse with https://bugs.launchpad.net/neutron/+bug/1748658 which is specific to the persistence of network namespaces)

tl;dr: services like metadata, dhcp, routing, etc. are significantly impacted when related neutron agents in a containerized deployment. This is a regression with respect to baremetal deployments.

From mailing list:
"The neutron agents are implemented in such a way that key functionality is implemented in terms of haproxy, dnsmasq, keepalived and radvd configuration. The agents manage instances of these services but, by design, the parent is the top-most (pid 1).

On baremetal this has the advantage that, while control plane changes cannot be made while the agents are not available, the configuration at the time the agents were stopped will work (for example, VMs that are restarted can request their IPs, etc). In short, the dataplane is not affected by shutting down the agents.

In the TripleO containerized version of these agents, the supporting processes (haproxy, dnsmasq, etc.) are run within the agent's container so when the container is stopped, the supporting processes are also stopped. That is, the behavior with the current containers is significantly different than on baremetal and stopping/restarting containers effectively breaks the dataplane. At the moment this is being considered a blocker and unless we can find a resolution, we may need to recommend running the L3, DHCP and metadata agents on baremetal."

This problem is exacerbated by the fact that neutron is not container-aware and does not currently support launching containers for these processes nor can directly monitor them.

Revision history for this message
Brent Eagles (beagles) wrote :

Marking as "critical" for now. While it does not affect CI it has project-wide implications with respect to planning, etc.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Brent Eagles (beagles)
milestone: none → queens-rc1
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

IIUC, those haproxy, dnsmasq, keepalived and radvd are not the neutron agents' child processes, right? I'm recalled with ironic HA topic, where that was exactly tha case, thus asking. If those used to be just system daemons, we should implement those as a separate containers running within the neutron agents "pods". K8s pods, once we're there and just containers sharing Linux namespaces et al otherwise.

Revision history for this message
Brent Eagles (beagles) wrote :

While not children of the agent's PID-wise, they are in some sense "subprocesses". Neutron starts them, monitors whether they are running, cleans them up when they are no longer needed, etc. Processes are also run in specific network namespaces that neutron creates in order to provide tenant isolation etc. For example, if I have two subnets there will be two dnsmasq instances, two haproxy's for metadata, etc.

Revision history for this message
Brent Eagles (beagles) wrote :

There was verbal discussion on this and some good points were brought up:
1. this isn't tripleo specific but is common to all kolla containers
2. if HA is properly configured, the HA functionality should take care of this within reason
3. a service can be signaled without restarting the entire container if necessary
4. without neutron being "container aware" there are limits to what can be achieved in the short term

Brought up separately with respect to (2.) is that while failover should handle the normal operational case, operators restarting agents for tweaking and debugging is something that we see often so this would impact them. Perhaps (3.) will sufficiently address this.

It's generally agreed that we need to document the difference with restarting containers vs. services on baremetal and how to adapt practices to containerized deployments. I've filed a bug for this https://bugs.launchpad.net/tripleo/+bug/1750578.

Brent Eagles (beagles)
Changed in tripleo:
importance: Critical → High
Changed in tripleo:
milestone: queens-rc1 → rocky-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/549855

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/549855
Reason: clear up the gate to merge CI blockers, I'll restore the patches.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/549855
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=66ad3f8d1975966b644e704d5e0a538b0fbd4eaa
Submitter: Zuul
Branch: master

commit 66ad3f8d1975966b644e704d5e0a538b0fbd4eaa
Author: Brent Eagles <email address hidden>
Date: Mon Mar 5 15:52:51 2018 -0330

    Add docker packages to dhcp and l3 agent containers

    The dhcp and l3 neutron agent containers need to be able to launch
    docker containers for their subprocesses.

    Change-Id: I8d93f4eccde1dc6e55e10399184ee80671355769
    Related-Bug: #1749209

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/561650

Brent Eagles (beagles)
tags: added: queens-backport-potential
Changed in tripleo:
milestone: rocky-1 → rocky-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/queens)

Reviewed: https://review.openstack.org/561650
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=04d8bc53123946ac72f0f73145e6a0c201f34771
Submitter: Zuul
Branch: stable/queens

commit 04d8bc53123946ac72f0f73145e6a0c201f34771
Author: Brent Eagles <email address hidden>
Date: Mon Mar 5 15:52:51 2018 -0330

    Add docker packages to dhcp and l3 agent containers

    The dhcp and l3 neutron agent containers need to be able to launch
    docker containers for their subprocesses.

    Change-Id: I8d93f4eccde1dc6e55e10399184ee80671355769
    Related-Bug: #1749209
    (cherry picked from commit 66ad3f8d1975966b644e704d5e0a538b0fbd4eaa)

tags: added: in-stable-queens
Revision history for this message
Brent Eagles (beagles) wrote :

Should've been marked as critical as it is a significant regression from ocata/baremetal and neutron's HA features are basically broken.

Changed in tripleo:
importance: High → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/564191

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/564191
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=86117a7ce86ac2a2f0ae74bd907abac292d4d751
Submitter: Zuul
Branch: master

commit 86117a7ce86ac2a2f0ae74bd907abac292d4d751
Author: Brent Eagles <email address hidden>
Date: Wed Apr 25 09:52:59 2018 -0230

    Fix missing docker package in neutron L3 agent container

    We need to use the append macro to add the docker package to the L3 agent
    container.

    Note: the dhcp image should really be doing the same thing, but requires
    a change to the kolla repo and will be addressed in follow-up patches.
    (See https://bugs.launchpad.net/kolla/+bug/1766863)

    Change-Id: Ib2d2ad4960ea34ec9e3fca1eeb322742341f7eb7
    Related-Bug: #1749209

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/564483

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/queens)

Reviewed: https://review.openstack.org/564483
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c5d3a943bc8f7b5ea95d5ef8e3ecfd4765563c05
Submitter: Zuul
Branch: stable/queens

commit c5d3a943bc8f7b5ea95d5ef8e3ecfd4765563c05
Author: Brent Eagles <email address hidden>
Date: Wed Apr 25 09:52:59 2018 -0230

    Fix missing docker package in neutron L3 agent container

    We need to use the append macro to add the docker package to the L3 agent
    container.

    Note: the dhcp image should really be doing the same thing, but requires
    a change to the kolla repo and will be addressed in follow-up patches.
    (See https://bugs.launchpad.net/kolla/+bug/1766863)

    Change-Id: Ib2d2ad4960ea34ec9e3fca1eeb322742341f7eb7
    Related-Bug: #1749209
    (cherry picked from commit 86117a7ce86ac2a2f0ae74bd907abac292d4d751)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/550224
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=015c9b757af33e7c2c07d932c45145e1c09ca733
Submitter: Zuul
Branch: master

commit 015c9b757af33e7c2c07d932c45145e1c09ca733
Author: Brent Eagles <email address hidden>
Date: Fri Mar 9 17:32:34 2018 -0330

    Adding wrapper scripts for neutron agent subprocesses

    The neutron agents use subprocesses like dnsmasq and keepalived as part
    of their implementation. Running these "subprocesses" in separate
    containers prevent dataplane breakages/unnecessary failover on agent
    container restart.

    Also amends docker daemon options to allow including additional unix
    domain sockets to bind to the docker daemon. The paths can be mounted by
    containers that launch containers instead of mounting /run/docker.sock.
    This avoids issues if the docker daemon is restarted while the containers
    are running.

    Related-Bug: #1749209
    Change-Id: Icd4c24ac686d957391548a04722266cefc1bce27

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/566559

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/queens)

Reviewed: https://review.openstack.org/566559
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=3d822ef2241b71127e5aface0e3384be0e938db1
Submitter: Zuul
Branch: stable/queens

commit 3d822ef2241b71127e5aface0e3384be0e938db1
Author: Brent Eagles <email address hidden>
Date: Fri Mar 9 17:32:34 2018 -0330

    Adding wrapper scripts for neutron agent subprocesses

    The neutron agents use subprocesses like dnsmasq and keepalived as part
    of their implementation. Running these "subprocesses" in separate
    containers prevent dataplane breakages/unnecessary failover on agent
    container restart.

    Also amends docker daemon options to allow including additional unix
    domain sockets to bind to the docker daemon. The paths can be mounted by
    containers that launch containers instead of mounting /run/docker.sock.
    This avoids issues if the docker daemon is restarted while the containers
    are running.

    Conflicts:
        manifests/profile/base/docker.pp

    Related-Bug: #1749209
    Change-Id: Icd4c24ac686d957391548a04722266cefc1bce27
    (cherry picked from commit 015c9b757af33e7c2c07d932c45145e1c09ca733)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/550823
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=b02273765240be34cd02c9c0832b3af256e3aba8
Submitter: Zuul
Branch: master

commit b02273765240be34cd02c9c0832b3af256e3aba8
Author: Brent Eagles <email address hidden>
Date: Wed Mar 7 09:34:55 2018 -0330

    Generate and mount wrappers for neutron agent processes

    The neutron agents use things like dnsmasq and keepalived as part of
    their implementation. Running these "subprocesses" in separate
    containers prevent dataplane breakages/unnecessary failover on agent
    container restart. This patch triggers the creation and mounting of
    wrappers for launching these processes in containers.

    Related-Bug: #1749209
    Depends-On: Icd4c24ac686d957391548a04722266cefc1bce27
    Depends-On: I8d93f4eccde1dc6e55e10399184ee80671355769
    Depends-On: Ib2d2ad4960ea34ec9e3fca1eeb322742341f7eb7
    Change-Id: Iea53489c916765bcfd88d7d12e6a32e1b6276d81

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/567196

Brent Eagles (beagles)
Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/570942

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/571130

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/567196
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=ae90e24146cbeb5af3f42e9e2e01cf842984cd9f
Submitter: Zuul
Branch: stable/queens

commit ae90e24146cbeb5af3f42e9e2e01cf842984cd9f
Author: Brent Eagles <email address hidden>
Date: Wed Mar 7 09:34:55 2018 -0330

    Generate and mount wrappers for neutron agent processes

    The neutron agents use things like dnsmasq and keepalived as part of
    their implementation. Running these "subprocesses" in separate
    containers prevent dataplane breakages/unnecessary failover on agent
    container restart. This patch triggers the creation and mounting of
    wrappers for launching these processes in containers.

    Related-Bug: #1749209
    Depends-On: Icd4c24ac686d957391548a04722266cefc1bce27
    Depends-On: I8d93f4eccde1dc6e55e10399184ee80671355769
    Depends-On: Ib2d2ad4960ea34ec9e3fca1eeb322742341f7eb7
    Change-Id: Iea53489c916765bcfd88d7d12e6a32e1b6276d81
    (cherry picked from commit b02273765240be34cd02c9c0832b3af256e3aba8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/571672

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/571672
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=2458872776f748101e19684a4bcfe1dc82d96360
Submitter: Zuul
Branch: master

commit 2458872776f748101e19684a4bcfe1dc82d96360
Author: Daniel Alvarez <email address hidden>
Date: Fri Jun 1 11:44:22 2018 +0200

    Add docker packages to OVN metadata agent container

    The OVN metadata agent container needs to be able to launch docker
    containers for its subprocesses.

    Change-Id: Ic7dcbe33f361a09eaa8e54270d758d09eb7a1e72
    Related-Bug: #1749209
    Signed-off-by: Daniel Alvarez <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/570942
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=965aba275eb06af0f5f1550176dd458653c3b33b
Submitter: Zuul
Branch: master

commit 965aba275eb06af0f5f1550176dd458653c3b33b
Author: Daniel Alvarez <email address hidden>
Date: Tue May 29 16:07:24 2018 +0200

    Adding wrapper script for haproxy in OVN metadata agent

    Change-Id: Ieb5618ec96539bb07d9d01b5c6d27da962f65156
    Related-Bug: #1749209
    Signed-off-by: Daniel Alvarez <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to puppet-tripleo (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/581749

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/581750

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (stable/queens)

Reviewed: https://review.openstack.org/581750
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=451aace65e0271eb374a92c045c93c8680c27be2
Submitter: Zuul
Branch: stable/queens

commit 451aace65e0271eb374a92c045c93c8680c27be2
Author: Daniel Alvarez <email address hidden>
Date: Fri Jun 1 11:44:22 2018 +0200

    Add docker packages to OVN metadata agent container

    The OVN metadata agent container needs to be able to launch docker
    containers for its subprocesses.

    Change-Id: Ic7dcbe33f361a09eaa8e54270d758d09eb7a1e72
    Related-Bug: #1749209
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry picked from commit 2458872776f748101e19684a4bcfe1dc82d96360)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/571130
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=662814ed1c121a7177c95b4a774d1c1fae686087
Submitter: Zuul
Branch: master

commit 662814ed1c121a7177c95b4a774d1c1fae686087
Author: Daniel Alvarez <email address hidden>
Date: Wed May 30 10:10:02 2018 +0200

    Generate and mount wrappers for haproxy in OVN metadata agent

    OVN metadata agent uses haproxy as part of its implementation.
    Running it in a separate container prevents dataplane breakages
    (ie. restarting VMs or spawning new ones) on agent restart/stop.
    This patch triggers the creation of such sidecar container and
    mounting of haproxy wrapper for spawning it in a separate
    container.

    Change-Id: I59e08384080cda0b6c0f03c9ed8fb6f6a5661e6b
    Related-Bug: #1749209
    Signed-off-by: Daniel Alvarez <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/591298

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to puppet-tripleo (stable/queens)

Reviewed: https://review.openstack.org/581749
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=9f27b65bfdf0b1761e8c93ce10d5986af3d2e86e
Submitter: Zuul
Branch: stable/queens

commit 9f27b65bfdf0b1761e8c93ce10d5986af3d2e86e
Author: Daniel Alvarez <email address hidden>
Date: Tue May 29 16:07:24 2018 +0200

    Adding wrapper script for haproxy in OVN metadata agent

    Change-Id: Ieb5618ec96539bb07d9d01b5c6d27da962f65156
    Related-Bug: #1749209
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry picked from commit 965aba275eb06af0f5f1550176dd458653c3b33b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/591298
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1a079fe0b9fdfd832b32bb0729f25686da7bf51b
Submitter: Zuul
Branch: stable/queens

commit 1a079fe0b9fdfd832b32bb0729f25686da7bf51b
Author: Daniel Alvarez <email address hidden>
Date: Wed May 30 10:10:02 2018 +0200

    Generate and mount wrappers for haproxy in OVN metadata agent

    OVN metadata agent uses haproxy as part of its implementation.
    Running it in a separate container prevents dataplane breakages
    (ie. restarting VMs or spawning new ones) on agent restart/stop.
    This patch triggers the creation of such sidecar container and
    mounting of haproxy wrapper for spawning it in a separate
    container.

    Conflicts:
        docker/services/ovn-metadata.yaml
        puppet/services/docker.yaml

    Change-Id: I59e08384080cda0b6c0f03c9ed8fb6f6a5661e6b
    Related-Bug: #1749209
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry-picked from commit 662814ed1c121a7177c95b4a774d1c1fae686087)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.