Juniper Openstack

config fails to delete objects with stale hrefs

Bug #1637498 reported by Martin Millnert on 2016-10-28

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Juniper Openstack	Status tracked in Trunk
R3.0	Invalid	Medium	Sachin Bansal
R3.1	Invalid	Medium	Sachin Bansal
R3.2	Invalid	Medium	Sachin Bansal	Juniper Openstack r3.2.0.0-fcs "r3.2.0.0"
Trunk	Invalid	Medium	Sachin Bansal	Juniper Openstack r4.0.0.0-fcs "r4.0.0.0"

Bug Description

config fails to delete objects with stale hrefs with "Delete when children still present".
In a real openstack deployment, the order with which resources are deleted is not very strict.

There are in OpenStack, really only two different cases when deleting a project:
1) The project has no dependent (child) resource,
2) The project has dependent (child) resources

We, in both prod and dev environments, always end up, and especially after running openstack-rally load tests, with a cfgdb that is massively out of sync with the openstack database.
This eventually leads to complete contrail meltdown due to what appears to be contrail-svc-monitor timeouts (too long list of objects to delete - which it fails to do), and the follow-on effects which brings down the API server and a world of pain ensues. We haven't yet tried to reproduce this on up-to-date R3.1.

The example above was with a project but is actually generic; there are several cases where contrail's vnc code fails to properly sync (delete) from openstack->contrail, due to stale children object links. I haven't mapped them all out yet.

In VNC ( contrail-controller/src/config/vnc_openstack/vnc_openstack/__init__.py) there is both the sync/resync code and e.g. pre_*_delete, such as pre_project_delete, as well as in the API server (contrail-controller/src/config/api-server/vnc_cfg_api_server.py), e.g. the http_resource_delete function.

I believe for example in API server, the method http_resource_delete, should check for stale hrefs, i.e. under https://github.com/Juniper/contrail-controller/blob/30568809ae3e8fe61d50ffed90e49bcae6e03962/src/config/api-server/vnc_cfg_api_server.py#L915
I have for example a situation now with projects that won't be deleted due to two children links, a virtual-network and a virtual-machine-interface, both of which does not actually exist.

I see two solutions (both probably worth pursuing):
1) Backrefs should be pruned when an object is deleted, i.e. when those child objects were actually deleted (there's necessarily at least some code path that doesn't do this),
2) http_resource_delete should check that children objects actually exist, not just that there is an outbound HREF to them.

There's a third one, but I'm not as sure about this one as it is a band-aid rather than proper fix to data model cleanliness:
3) Extend pre_*_delete with more deletes on child objects, where appropriate.

See original description

Tags:

Martin Millnert (r-martin-5) on 2016-10-28

information type:	Proprietary → Public
description:	updated

Revision history for this message

Sachin Bansal (sbansal) wrote on 2016-11-04:

As discussed over slack, the problem was incorrect keystone config, hence marking this invalid.

Revision history for this message

Martin Millnert (r-martin-5) wrote on 2016-11-04:

It does imply that not having keystone configured in vnc_api_lib and auth = keystone in contrail-api.conf results in a non-functional Contrail. If there's no current deployment scenario that allows for this, then that's all fine I guess. We were stuck with a 1 year old modus.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.