config fails to delete objects with stale hrefs

Bug #1637498 reported by Martin Millnert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R3.0
Invalid
Medium
Sachin Bansal
R3.1
Invalid
Medium
Sachin Bansal
R3.2
Invalid
Medium
Sachin Bansal
Trunk
Invalid
Medium
Sachin Bansal

Bug Description

config fails to delete objects with stale hrefs with "Delete when children still present".
In a real openstack deployment, the order with which resources are deleted is not very strict.

There are in OpenStack, really only two different cases when deleting a project:
1) The project has no dependent (child) resource,
2) The project has dependent (child) resources

We, in both prod and dev environments, always end up, and especially after running openstack-rally load tests, with a cfgdb that is massively out of sync with the openstack database.
This eventually leads to complete contrail meltdown due to what appears to be contrail-svc-monitor timeouts (too long list of objects to delete - which it fails to do), and the follow-on effects which brings down the API server and a world of pain ensues. We haven't yet tried to reproduce this on up-to-date R3.1.

The example above was with a project but is actually generic; there are several cases where contrail's vnc code fails to properly sync (delete) from openstack->contrail, due to stale children object links. I haven't mapped them all out yet.

In VNC ( contrail-controller/src/config/vnc_openstack/vnc_openstack/__init__.py) there is both the sync/resync code and e.g. pre_*_delete, such as pre_project_delete, as well as in the API server (contrail-controller/src/config/api-server/vnc_cfg_api_server.py), e.g. the http_resource_delete function.

I believe for example in API server, the method http_resource_delete, should check for stale hrefs, i.e. under https://github.com/Juniper/contrail-controller/blob/30568809ae3e8fe61d50ffed90e49bcae6e03962/src/config/api-server/vnc_cfg_api_server.py#L915
I have for example a situation now with projects that won't be deleted due to two children links, a virtual-network and a virtual-machine-interface, both of which does not actually exist.

I see two solutions (both probably worth pursuing):
1) Backrefs should be pruned when an object is deleted, i.e. when those child objects were actually deleted (there's necessarily at least some code path that doesn't do this),
2) http_resource_delete should check that children objects actually exist, not just that there is an outbound HREF to them.

There's a third one, but I'm not as sure about this one as it is a band-aid rather than proper fix to data model cleanliness:
3) Extend pre_*_delete with more deletes on child objects, where appropriate.

information type: Proprietary → Public
description: updated
Revision history for this message
Sachin Bansal (sbansal) wrote :

As discussed over slack, the problem was incorrect keystone config, hence marking this invalid.

Revision history for this message
Martin Millnert (r-martin-5) wrote :

It does imply that not having keystone configured in vnc_api_lib and auth = keystone in contrail-api.conf results in a non-functional Contrail. If there's no current deployment scenario that allows for this, then that's all fine I guess. We were stuck with a 1 year old modus.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.