Removing unused base images removes backing files of active instances

Bug #1620341 reported by Jacolex
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

I've been experiencing dangerous issue that my backing files located on shared storage in _base folder are being removed by nova-compute. It's being happen on Juno, Kilo and Liberty releases. The shared storage mount /var/lib/nova/instances are configured on NFSv3. Backing image ids exists in /var/lib/nova/instances/locks/ folder for affected files. I don't know for sure, how the mechanism preventing _base files from deletion works - if it depends on locks folder or if it depends on locking files on shared storage, but from my point of view this is bug by design and the mechanism should be redesigned to not rely on client which is actually compute node. It causes many impacts on stability and security of users data!
I want to ask for considering some new cleaning system, because current cleaning worker is designed for indepenent compute nodes without shared storage and it looks like it was not well adapted for configurations with shared storage. Maybe developers should consider some central mechanism and fetching data about used and unused _base files from database, not relying what is running on not on compute node locally.
I can't reproduce this problem anymore because I had to disable cleaning unused base images and deploy own, secure worker.

Jacolex (jacolex)
description: updated
Matt Riedemann (mriedem)
tags: added: compute image-cache libvirt nfs
Revision history for this message
Matthew Booth (mbooth-9) wrote :

There's not a lot to go on here, unfortunately. The problem of image cache manager deleting things that are still in use is well known, and in fact we have workarounds elsewhere in the code for it. Out of interest I'd be curious to know how this impacts users, because it also means that the workaround code is broken and/or incomplete.

The underlying problem is obviously image cache manager itself. Without anything more specific to go on I can throw up a patch which fixes a couple of races I was vaguely aware of but hadn't gotten round to addressing. I can't guarantee it will fix your problem, but it might.

Revision history for this message
Matthew Booth (mbooth-9) wrote :

So, I was answering this in the context of master, but looking again at change I376cc951922c338669fdf3f83da83e0d3cea1532, that didn't land until Mitaka, and you only mention up to Liberty. There's a decent chance that's the specific issue you're hitting. You'll have to upgrade to get it though, obviously.

Revision history for this message
Matthew Booth (mbooth-9) wrote :

Backport was rejected here: https://review.openstack.org/#/c/278928/

Revision history for this message
Matthew Booth (mbooth-9) wrote :

Some races fixed in this patch: https://review.openstack.org/366239

As above, though, I suspect your problem is more likely to be resolved by change I376cc951.

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Given the problem is related to Liberty, and possible resolution could have been made by Mitaka, could you please try to reproduce the problem with ideally the master branch (or at least Newton) to see whether the problem is still existing ?

Changed in nova:
status: New → Incomplete
Revision history for this message
Jacolex (jacolex) wrote :

Ok, I'll try to reproduce problem on the Newton branch

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/366239
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.