[RFE] Support cleanup of all resources associated with a given tenant

Bug #1511574 reported by John Davidge
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
python-neutronclient
Fix Released
Wishlist
John Davidge

Bug Description

In the ops painpoints session (https://etherpad.openstack.org/p/mitaka-neutron-next-ops-painpoints) a problem was identified where removing a tenant can leave behind stray routers, ports, etc.

It was suggested that a simple 'neutron purge <tenant_id>' command or similar would simplify the process of cleaning up these stray resources by removing everything associated with the given tenant.

The expectation is that this command would be admin-only, and neutron should not be responsible for deciding whether the action is 'safe'. It should work regardless of whether the given tenant is active or not.

This suggestion was very popular with the operators in the room. The consensus was that this would save a lot of time and effort where currently these resources have to be discovered and then removed one by one.

Revision history for this message
John Davidge (john-davidge) wrote :

I'd be happy to volunteer for this once the RFE is approved.

Revision history for this message
Cedric Brandily (cbrandily) wrote :

Could ospurge[1] address (at least partially) your needs?

[1] https://github.com/openstack/ospurge

Henry Gessau (gessau)
Changed in neutron:
status: New → Confirmed
tags: added: usability
Henry Gessau (gessau)
tags: added: ops
Revision history for this message
John Davidge (john-davidge) wrote :

Yes, ospurge seems to meet the exact needs described by operators in the session. Some mentioned having scripts in place to do this already (although I don't remember ospurge being named specifically). The desire was that these functions become a part of the neutron API itself. Functionality similar to ospurge's --dry-run option was asked for as well.

The features already offered in ospurge would be a great starting point to discuss the scope of this RFE. Thanks for pointing it out Cedric.

Changed in neutron:
assignee: nobody → Satyanarayana Patibandla (satya-patibandla)
Revision history for this message
John Davidge (john-davidge) wrote :

@Satyanarayana This RFE requires approval before an assignee is decided. I'll assign myself for now until the discussion can take place.

Changed in neutron:
assignee: Satyanarayana Patibandla (satya-patibandla) → nobody
assignee: nobody → John Davidge (john-davidge)
Revision history for this message
Henry Gessau (gessau) wrote :
Changed in neutron:
importance: Undecided → Wishlist
summary: - Support cleanup of all resources associated with a given tenant with a
- single API call
+ [RFE] Support cleanup of all resources associated with a given tenant
+ with a single API call
Revision history for this message
Assaf Muller (amuller) wrote : Re: [RFE] Support cleanup of all resources associated with a given tenant with a single API call

This is a cross-project concern and should be addressed as such. I don't want to introduce a new API endpoint to Neutron without getting cross-project consensus on that approach.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I am not sure we'll ever get cross-project consensus to be honest.

We can rely on something like ospurge, that cleans up tenant resources from a project by using the project's API, but this may lead to a lot of chattiness, and to potential unrecoverable errors.

If we assumed that we can have an API exposed by Neutron that purges the resources from a single tenant, this can be done a lot more effectively. It's handy to have it, but not strictly necessary. If someone were to propose an implementation (as a service plugin perhaps), we could look at its merits.

Changed in neutron:
status: Confirmed → Triaged
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

To be discussed at the drivers meeting.

Revision history for this message
Assaf Muller (amuller) wrote :

I don't want to merge a new project-purge API to Neutron without first at least having a discussion with the other major projects.

Revision history for this message
Henry Gessau (gessau) wrote :

We have a volunteer (John Davidge). It might be that neutron is the first to add a "purge_resources_owned_by(tenant_id)" API, but that shouldn't stop us. We should definitely get cross-project agreement on how such an API should look/behave so that other projects can align when they get around to their implementations. I am willing to be the approver for this effort if we can agree to go for it. I got the sense that operators will love the feature.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

We should allow this work to proceed, but with the caveat that we do so by ensuring that we don't paint ourselves in the corner should a cross-project initiative take off. I'll let Assaf chime in on the rest. More discussion in [1].

[1] http://eavesdrop.openstack.org/meetings/neutron_drivers/2015/neutron_drivers.2015-12-01-15.00.log.html

tags: added: rfe-approved
removed: rfe
Revision history for this message
Assaf Muller (amuller) wrote :
Download full text (5.2 KiB)

First I'd like to link my previous work on this subject:
https://review.openstack.org/#/q/topic:bp/tenant-delete,n,z

As Armando said very nicely in today's drivers meeting, it was 'ahead of its time'. Although that effort was abandoned, we can learn a few things from it. First, on the practical level, here's the code that actually deleted resources:

https://review.openstack.org/#/c/92600/5/neutron/services/identity/project_deleted.py

If you scroll all the way down you can see (Roughly) the correct order in which we would delete resources. One thing to note is shared resources: If the tenant being deleted has a shared network with ports on it from other tenants, you wouldn't be able to delete that network. You should fail gracefully and skip it, and continue to delete other resources (The code doesn't currently do that).

The more interesting discussion is about the high level approach taken by that patch series, and that is:
1) Listen on the RPC bus for keystone tenant deletion messages (Back when I worked on it, those were actually not emitted by default, you had to configure Keystone to do that. I don't know if this has changed since)
2) In case neutron-server(s) was/were not up when that notification was sent, provide a CLI script that invokes the (Same) tenant-deletion code.
3) The same script would optionally reach out to Keystone, and write a list of resources that existed in Neutron that were owned by tenants that no longer exist according to Keystone. The script could also take such a list (Persisted to a file) and actually delete those resources. Persisting this information to files is useful for ops for auditing reasons, and for integrating with billing systems for example.

There's alternatives to the approach taken by that patch series:
1) One such alternative is to offer a script (os-purge, already mentioned in this bug) that would contain all code by all projects that offer tenant-deletion code.
2) All projects will offer an API that accepts a tenant-id and deletes all resources that belong to that tenant. Ops would have to create a small script that accepts a tenant-id, then calls nova project-purge <id>, neutron project-purge <id>, etc. We could of course modify os-purge to use each project's project-purge as those are implemented (i.e. os-purge would call neutron project-purge instead of implementing that logic on its own). Side note: *aaS and other Neutron stadium projects that maintain resources are an interesting problem. Would they introduce their code to the main Neutron repo, or would Neutron offer pluggability to project-purge?

Ultimately both 1 and 2 are simpler than my patch series and I would advise against listening on the RPC bus. It's a nice idea in theory but it's too complicated because you need to write the sync code anyway.

One advantage to the 2nd approach over the 1st is that each project is more likely to maintain higher quality and up to date code over having it out of repo.

I propose an alteration on the 2nd approach. Call it approach 2'. I'm afraid that until someone (John?) proposes a cross-project spec and follows up on the discussion, we might find ourselves in a position where we're the first ...

Read more...

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Assaf, thanks for the nicely written comment.

So to try, and distill it a bit to ensure we're on the same page. I'd like to say that for the sake of this effort:

* We should not pursue an end-to-end integration strategy: Neutron should provide the ability to purge project resources that is invoked by an external party.
* The smallest unit of cleanup is a project, aka tenant in existing Neutron parlance, rather than domain, users and whatnot.

To your summary:

1) File a cross-project spec.
2) Write the code, expose it via CLI, not API.
3) If 1 succeeds, expose the already merged code via the API.

I see the following open issues:

* If no REST API is going to be provided to start with, the tool must be invoked from the node where a single server is running, no? This sounds pretty limiting to me, but if acceptable, we should make this clear upfront. Not sure what'd gain by not exposing the API since operators would still use this in interim and switching to something different is still going to be painful. The difference in this case is that we'd make their life miserable asking them to ssh into one of boxes, when they could neatly do that from their home laptop connected to the admin url. It's either that, or I am grossly missing something and I am making a fool of myself :)
* We can't expect that the submitter of the purge code neutron side to be the same person that files a cross-project spec. That requires a different set of skills, i.e. we need a seasoned dude like yourself to help champion that through, are you prepared to do the legwork?
* We should be careful about the chain of dependencies between known and unknown (extension) resources, so the framework must be planned to allow for hooks and all.
* Testing: the tool can easily break if we don't test this continuously in that a commit about a dependency change, or a policy rule change, can easily invalidate the dependency chain, and that can prevent the deletion, leading to stale resources. This also bring me to the next point.
* purge is intrinsically an async operation, therefore the API should be task-oriented, where the user can check the operation status, whether it has completed successfully, whether there are some resources still behind etc.

In a nutshell, this is no small feat of engineering and a spec (both for the Neutron design internals) and the user facing API must be pursued.

Thoughts?

Revision history for this message
Assaf Muller (amuller) wrote :

armax said:
> * If no REST API is going to be provided to start with, the tool must be invoked from the node where a single server is running, no? This sounds pretty limiting to me, but if acceptable, we should make this clear upfront. Not sure what'd gain by not exposing the API since operators would still use this in interim and switching to something different is still going to be painful. The difference in this case is that we'd make their life miserable asking them to ssh into one of boxes, when they could neatly do that from their home laptop connected to the admin url. It's either that, or I am grossly missing something and I am making a fool of myself :)

If the CLI script uses only the Neutron API client as I suggested then it could be used from anywhere the client is installed (That is the reason I suggested it uses the API client and not the plugin layer, I'm sorry if I was not clear).

> * We can't expect that the submitter of the purge code neutron side to be the same person that files a cross-project spec. That requires a different set of skills, i.e. we need a seasoned dude like yourself to help champion that through, are you prepared to do the legwork?

To be honest it's dangerous for me to commit to something like that, it might end up not happening because of my own time constraints. I mean well, but... I would prefer we define something fairly concrete then have John file the spec (If he is willing). Push comes to shove I'll do it.

> * We should be careful about the chain of dependencies between known and unknown (extension) resources, so the framework must be planned to allow for hooks and all.

Extension top level resources? Can you elaborate? Are you talking about *aaS, or something like QoS? Couldn't we use discovery to see if the Neutron server supports the extension and delete conditionally? I am not sure what you mean by 'hooks'.

> * Testing: the tool can easily break if we don't test this continuously in that a commit about a dependency change, or a policy rule change, can easily invalidate the dependency chain, and that can prevent the deletion, leading to stale resources. This also bring me to the next point.

Naturally :) The script should have a functional test, we can discuss this in the implementation phase.

> * purge is intrinsically an async operation, therefore the API should be task-oriented, where the user can check the operation status, whether it has completed successfully, whether there are some resources still behind etc.

I had not considered this. We should discuss this further.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

> If the CLI script uses only the Neutron API client as I suggested then
it could be used from anywhere the client is installed (That is the
reason I suggested it uses the API client and not the plugin layer, I'm
sorry if I was not clear).

But then how is it any different from os-purge itself?

> To be honest it's dangerous for me to commit to something like that, it
might end up not happening because of my own time constraints. I mean
well, but... I would prefer we define something fairly concrete then
have John file the spec (If he is willing). Push comes to shove I'll do
it.

Understood, I am only saying that John might not be well equipped to drive that, if you can't do it for good reasons, we'd need to figure out who would.

> Extension top level resources? Can you elaborate? Are you talking about
*aaS, or something like QoS? Couldn't we use discovery to see if the
Neutron server supports the extension and delete conditionally? I am not
sure what you mean by 'hooks'.

Sorry I wasn't clear: I meant stuff like firewall, vpns, load balancers, unicorns and butterflies that Neutron might not be necessarily aware of. We can delete a router, but if a router's deletion is prevented by a unicorn attached to it, then we gotta figure out a way to delete the unicorn first. Are you with me?

> Naturally :) The script should have a functional test, we can discuss
this in the implementation phase.

Ok, no argument about this.

> I had not considered this. We should discuss this further.

Let me know if I am talking nonsense.

Revision history for this message
John Davidge (john-davidge) wrote :

Great discussion here, allow me to weigh in.

First, I should say that I am not particularly interested in (nor well equipped for) leading a cross-project spec for this. I am of the opinion that if we try to agree on syntax etc across all projects (and which projects would we include?) before getting started then nothing will get done for a very long time. This is a consistent pain point from the ops group so I'd prefer to fix it as soon as possible. That doesn't necessarily mean that I think we should do this work in a vacuum. We already have an established approach in the form of ospurge, and I think that we should leverage that in creating an API that operators will already be familiar with. When we have something that works we can use that in leading the proposal of a cross-project spec with already established conventions.

I'm not sure I see the benefit in not making this API-accessible from the beginning. If we build this feature then we should make it as easy to use as possible otherwise we're just creating another pain point for operators. If future cross-project work means that we have to make changes to the API then that's fine.

As for what that API would look like, I'd like to propose that we follow the conventions already established by ospurge. Many operators will already be familiar with the commands, and they are generic enough to work across projects in the future. Specifically, it would look like:

$ neutron purge <project_id>

This is based on the assumption that we will make this an admin-only API to begin with, but we could also allow users to purge their own projects with a simple:

$ neutron purge

We can also make use of ospurge's --dry-run syntax to return a list of the resources that would be deleted by a full purge.

Assaf - the work that you did last year [1] will be very helpful in providing a starting point for deletion orders etc. Thank you.

I think we have a good opportunity to remove a pain point for operators here. Should I write a spec before getting started or are we more or less agreed on a direction here for the first pass?

[1] - https://review.openstack.org/#/c/92600/5/neutron/services/identity/project_deleted.py

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

I was syncing up with Nova and it sounds like that any requirement that would impose server-side orchestration in Nova is not going to fly, ever (see [1] for details). This means that any cross-project initiative is dead from the start. Taking into account Assaf's input and knowing that a tool like os-purge is available and does the job this RFE was meant to address, what's left for us to do?

I am tempted to reopen this for discussion to prevent us from going into the rabbit hole.

[1] http://docs.openstack.org/developer/nova/project_scope.html#no-more-orchestration

tags: added: rfe
removed: rfe-approved
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

@John: based on discussion [1], we decided to abandon the cross-project initiative. The current marching order is: let's integrate directly into the neutron client, ie. expose a neutron purge --project-id command (feel free to elaborate more during code review). OS-purge may then reference this supported neutron capability or not. So in a nutshell: no server-side orchestration.

[1] http://eavesdrop.openstack.org/meetings/neutron_drivers/2015/neutron_drivers.2015-12-15-15.01.log.html

Changed in neutron:
milestone: none → mitaka-2
tags: added: rfe-approved
removed: rfe
summary: [RFE] Support cleanup of all resources associated with a given tenant
- with a single API call
Changed in neutron:
milestone: mitaka-2 → mitaka-3
Revision history for this message
John Davidge (john-davidge) wrote :

FYI, I've recently started working on a patch to tackle this. Will have a WIP soon(ish).

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

great to hear!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/275460

Changed in neutron:
status: Triaged → In Progress
Changed in neutron:
status: In Progress → Won't Fix
assignee: John Davidge (john-davidge) → nobody
Changed in python-neutronclient:
assignee: nobody → John Davidge (john-davidge)
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by John Davidge (<email address hidden>) on branch: master
Review: https://review.openstack.org/275460

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-neutronclient (master)

Fix proposed to branch: master
Review: https://review.openstack.org/276541

Changed in python-neutronclient:
status: Confirmed → In Progress
Changed in neutron:
milestone: mitaka-3 → none
milestone: none → mitaka-3
Changed in python-neutronclient:
importance: Undecided → Wishlist
Changed in neutron:
status: Won't Fix → Invalid
Changed in python-neutronclient:
assignee: John Davidge (john-davidge) → Reedip (reedip-banerjee)
Changed in python-neutronclient:
assignee: Reedip (reedip-banerjee) → John Davidge (john-davidge)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-neutronclient (master)

Reviewed: https://review.openstack.org/276541
Committed: https://git.openstack.org/cgit/openstack/python-neutronclient/commit/?id=88fbd3870cf3872fb0c9b4269503c62458664b3b
Submitter: Jenkins
Branch: master

commit 88fbd3870cf3872fb0c9b4269503c62458664b3b
Author: John Davidge <email address hidden>
Date: Mon Feb 15 10:02:54 2016 -0800

    Support cleanup of tenant resources with a single API call

    The addition of the 'neutron purge' command allows cloud admins
    to conveniently delete multiple neutron resources associated
    with a given tenant.

    The command will delete all supported resources provided that
    they can be deleted (not in use, etc) and feedback the amount
    of each resource deleted to the user. A completion percentage
    is also given to keep the user informed of progress.

    Currently supports deletion of:

    Networks
    Subnets (implicitly)
    Routers
    Ports (including router interfaces)
    Floating IPs
    Security Groups

    This feature can be easily extended to support more resource
    types in the future.

    DocImpact: Update API documentation to describe neutron-purge usage

    Change-Id: I5a366d3537191045eb53f9cccd8cd0f7ce54a63b
    Closes-Bug: 1511574
    Partially-Implements: blueprint tenant-delete

Changed in python-neutronclient:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/python-neutronclient 4.1.0

This issue was fixed in the openstack/python-neutronclient 4.1.0 release.

Changed in python-neutronclient:
milestone: none → 4.0.0
no longer affects: neutron
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.