Comment 0 for bug 1637498

Revision history for this message
Martin Millnert (r-martin-5) wrote :

config fails to delete objects with stale hrefs with "Delete when children still present".
In a real openstack deployment, the order with which resources are deleted is not very strict.

There are in OpenStack, really only two different cases when deleting a project:
1) The project has no dependent (child) resource,
2) The project has dependent (child) resources

We, in both prod and dev environments, always end up, and especially after running openstack-rally load tests, with a cfgdb that is massively out of sync with the openstack database.
This eventually leads to complete contrail meltdown due to what appears to be contrail-svc-monitor timeouts (too long list of objects to delete - which it fails to do), and the follow-on effects which brings down the API server and a world of pain ensues.

The example above was with a project but is actually generic; there are several cases where contrail's vnc code fails to properly sync (delete) from openstack->contrail, due to stale children object links. I haven't mapped them all out yet.

In VNC ( contrail-controller/src/config/vnc_openstack/vnc_openstack/__init__.py) there is both the sync/resync code and e.g. pre_*_delete, such as pre_project_delete, as well as in the API server (contrail-controller/src/config/api-server/vnc_cfg_api_server.py), e.g. the http_resource_delete function.

I believe for example in API server, the method http_resource_delete, should check for stale hrefs, i.e. under https://github.com/Juniper/contrail-controller/blob/30568809ae3e8fe61d50ffed90e49bcae6e03962/src/config/api-server/vnc_cfg_api_server.py#L915
I have for example a situation now with projects that won't be deleted due to two children links, a virtual-network and a virtual-machine-interface, both of which does not actually exist.

I see two solutions (both probably worth pursuing):
1) Backrefs should be pruned when an object is deleted, i.e. when those child objects were actually deleted (there's necessarily at least some code path that doesn't do this),
2) http_resource_delete should check that children objects actually exist, not just that there is an outbound HREF to them.

There's a third one, but I'm not as sure about this one as it is a band-aid rather than proper fix to data model cleanliness:
3) Extend pre_*_delete with more deletes on child objects, where appropriate.