Bug #1825943 “[RFE] Implement “stop agent” hook script support i...” : Bugs : neutron

Cédric Jeanneret (cjeanner) on 2019-04-23

affects:	tripleo → neutron
Changed in neutron:
milestone:	train-1 → none
status:	Triaged → New

Slawek Kaplonski (slaweq) on 2019-04-23

tags:	added: rfe
Changed in neutron:
importance:	Medium → Wishlist

Revision history for this message

Miguel Lavalle (minsel) wrote on 2019-04-26:

#1

Hi Cedric,

Thanks for your submission. Of course we always consider proposals to improve Neutron. Having said that, I have a couple of questions about your proposal:

1) Shouldn't this cleanup be the responsibility of the containerization framework? Have you contemplated that possibility? Why wouldn't the containerization framework be capable of handling the cleanup? Why do we have to resort to Neutron?

2) You also mention "hooks" in your proposal. What would be the mechanism to register such hooks? An API?

Revision history for this message

Cédric Jeanneret (cjeanner) wrote on 2019-04-26:

#2

Hello Miguel,

Thank you for your interest. Here are the answers:

- Relying on the containerization framework:
Not really possible, since the "framework" doesn't exist: it's plain docker or podman calls. The only way to get those containers cleaned would be to run some heuristic job (like a garbage collector).

Running such a job is a bad idea, since it might delete containers that are really crashed, preventing an easy way to get them back (in TripleO, that would more or less require a new deploy run).
We might of course keep track of the containers in a file, "managed" by the current agent wrapper scripts, but that would be ugly, and might lead to issues if the file get corrupted for some reasons.

Since Neutron is already capable of creating and starting a container (via a wrapper script), it should also be able to stop it cleanly, and remove it, instead of keeping dangling things in place. That would also prevent misreading in case of debug.
For instance, we detected those dangling containers in OpenStack CI, because a new validation was added. This validation is checking for failed containers or services and, depending on when we run this validation, it pointed the Neutron metadata agent container (it was done after Tempest runs).

- Hook registration:
I'm talking about "hooks" in order to be wider than just "delete containers". Neutron already has a mechanism for the agent start allowing to use a wrapper script[1] - my idea/request is to make the same thing in order to stop the agent.
Being able to use a hook, or to sticking with the current nomenclature, a "wrapper script" would allow to do all the needed things, and even more.

Since the current "start" wrappers aren't added via an API, I suppose the "stop" wrappers/hooks don't need this feature. I didn't check how Neutron currently manages the start part, but in my mind, it's something like:
"does a script exist for this action? if so, execute it - if not, just kill the pid".

Does it answer your questions? Feel free to ping me if you need more details :).

Cheers,

C.

[1] those script are deployed during the installation. An example for such a file is here:
https://github.com/openstack/puppet-tripleo/blob/master/templates/neutron/haproxy.epp
It's created/managed by this class:
https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/neutron/wrappers/haproxy.pp

Hello Miguel,

Thank you for your interest. Here are the answers:

- Relying on the containerization framework:
Not really possible, since the "framework" doesn't exist: it's plain docker or podman calls. The only way to get those containers cleaned would be to run some heuristic job (like a garbage collector).

Running such a job is a bad idea, since it might delete containers that are really crashed, preventing an easy way to get them back (in TripleO, that would more or less require a new deploy run).
We might of course keep track of the containers in a file, "managed" by the current agent wrapper scripts, but that would be ugly, and might lead to issues if the file get corrupted for some reasons.

Since Neutron is already capable of creating and starting a container (via a wrapper script), it should also be able to stop it cleanly, and remove it, instead of keeping dangling things in place. That would also prevent misreading in case of debug.
For instance, we detected those dangling containers in OpenStack CI, because a new validation was added. This validation is checking for failed containers or services and, depending on when we run this validation, it pointed the Neutron metadata agent container (it was done after Tempest runs).

- Hook registration:
I'm talking about "hooks" in order to be wider than just "delete containers". Neutron already has a mechanism for the agent start allowing to use a wrapper script[1] - my idea/request is to make the same thing in order to stop the agent.
Being able to use a hook, or to sticking with the current nomenclature, a "wrapper script" would allow to do all the needed things, and even more.

Since the current "start" wrappers aren't added via an API, I suppose the "stop" wrappers/hooks don't need this feature. I didn't check how Neutron currently manages the start part, but in my mind, it's something like:
"does a script exist for this action? if so, execute it - if not, just kill the pid".

Does it answer your questions? Feel free to ping me if you need more details :).

Cheers,

C.

[1] those script are deployed during the installation. An example for such a file is here:
https://github.com/openstack/puppet-tripleo/blob/master/templates/neutron/haproxy.epp
It's created/managed by this class:
https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/neutron/wrappers/haproxy.pp

Revision history for this message

Slawek Kaplonski (slaweq) wrote on 2019-04-26:

#3

I agree with Cedric here. We are currently simply send "kill" (-9 in some cases) to the process, even if it is in fact container.
What we can IMHO do is provide e.g. config option (or options, one per type of process managed by neutron, like e.g. one for dnsmasq, one for keepalived and so on) that will allow to set some "kill script". If it would be set, it would be used instead of kill command to stop such processes.

I think that maybe this can somehow help also to solve problem with our too wide rootwrap kill filters for python processes, as we can then add similar "killer scripts" for e.g. "python neutron-keepalived-state-change-monitor" process.

Revision history for this message

Miguel Lavalle (minsel) wrote on 2019-05-09:

#4

@Cedric,

What do you think of Slawek's proposal? Does this address your use case? Can you add details to Slawek's proposal

Revision history for this message

Cédric Jeanneret (cjeanner) wrote on 2019-05-10:

#5

Hello Miguel,

well... that's more or less what I propose. Call it "kill script", "hook", as long as we get a way to stop and clean up containers instead of keeping them dangling with a bad exit status, I'm OK :).

Cheers,

C.

Revision history for this message

Brent Eagles (beagles) wrote on 2019-05-10:

#6

I agree with Slaweq's proposal of having separate config options for each type of process. Certain processes continue to run in the same container as the respective agent and others processes might require special service specific cleanups.

Miguel Lavalle (minsel) on 2019-05-16

tags:

added: rfe-triaged
removed: rfe

Revision history for this message

Miguel Lavalle (minsel) wrote on 2019-05-17:

#7

The neutron drivers team discussed and approved this RFE

tags:	added: rfe-approved removed: rfe-triaged
Changed in neutron:
assignee:	nobody → Slawek Kaplonski (slaweq)
status:	New → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-28: Fix proposed to neutron (master)

#8

Fix proposed to branch: master
Review: https://review.opendev.org/661760

Changed in neutron:
status:	Confirmed → In Progress

OpenStack Infra (hudson-openstack) on 2019-05-31

Changed in neutron:
assignee:	Slawek Kaplonski (slaweq) → Brian Haley (brian-haley)

OpenStack Infra (hudson-openstack) on 2019-05-31

Changed in neutron:
assignee:	Brian Haley (brian-haley) → Slawek Kaplonski (slaweq)

OpenStack Infra (hudson-openstack) on 2019-05-31

Changed in neutron:
assignee:	Slawek Kaplonski (slaweq) → Brian Haley (brian-haley)

OpenStack Infra (hudson-openstack) on 2019-06-03

Changed in neutron:
assignee:	Brian Haley (brian-haley) → Slawek Kaplonski (slaweq)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-05: Fix merged to neutron (master)

#9

Reviewed: https://review.opendev.org/661760
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=93015527f02cb54b0b2bc07c8c9c239c63878264
Submitter: Zuul
Branch: master

commit 93015527f02cb54b0b2bc07c8c9c239c63878264
Author: Slawek Kaplonski <email address hidden>
Date: Mon May 27 13:17:28 2019 +0200

Add kill hooks for external processes

This patch adds possibility to configure kill hooks used to kill
external processes, like dnsmasq or keepalived.

Change-Id: I29dfbedfb7167982323dcff1c4554ee780cc48db
Closes-Bug: #1825943

Changed in neutron:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-05: Fix proposed to neutron (stable/stein)

#10

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/663289

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-06-08: Fix merged to neutron (stable/stein)

#11

Reviewed: https://review.opendev.org/663289
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=702d4b321447d94598e1962cb67a7da968fa5fba
Submitter: Zuul
Branch: stable/stein

commit 702d4b321447d94598e1962cb67a7da968fa5fba
Author: Slawek Kaplonski <email address hidden>
Date: Mon May 27 13:17:28 2019 +0200

Add kill hooks for external processes

This patch adds possibility to configure kill hooks used to kill
external processes, like dnsmasq or keepalived.

Conflicts:
neutron/tests/functional/agent/l3/framework.py

    Change-Id: I29dfbedfb7167982323dcff1c4554ee780cc48db
    Closes-Bug: #1825943
    (cherry picked from commit 93015527f02cb54b0b2bc07c8c9c239c63878264)

tags:

added: in-stable-stein

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-07-01: Fix included in openstack/neutron 14.0.2

#12

This issue was fixed in the openstack/neutron 14.0.2 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-09-19: Fix included in openstack/neutron 15.0.0.0b1

#13

This issue was fixed in the openstack/neutron 15.0.0.0b1 development milestone.

neutron

[RFE] Implement "stop agent" hook script support in Neutron

Bug Description

Other bug subscribers

Remote bug watches