Instance files left over in case nova-compute is halted

Bug #814561 reported by Alexander Novikov
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
Derek Higgins
Diablo
Fix Released
Undecided
Unassigned

Bug Description

Release from http://ppa.launchpad.net/nova-core/release/ubuntu natty main (ubuntu 11.04). x64 kvm, flatDHCP network mode

 If compute-node somehow halted - the instance running on it (may) remains in /var/lib/nova/instances . While euca-describe-instance says this instance gone and in nova-compute.log when trying to terminate(euca-termintae) said "Error: trying to destroy already destroyed instance: 38".
I`ve seen and other behaviour(but probably due to network misconfig) when compute node actually terminate instance, but nova-api (euca-describe-instances) continue to "run" or "terminating" it and you can again and again euca-terminate it without errors but with no effect - reboot all nodes(possibly not network) helped.

Try to describe reproducing - get working cloud (cactus release), start an instance, see when it goes, than can do this:
stop networking on compute(or just physically eject interfaces, both if any), on that node by terminal stop nova-compute, then destroy VM domain, on api node issue euca-terminate instance. Euca-describe will be "terminating" state untill not restore networking on compute and start nova-compute on it. Then instance looks ended, but its files remain.
You may do the same by just destroying virsh VM domain, when nova-compute stopped(and the euca-terminate issued) - no need to manipulate network.

All I want to say - that nova-compute has to somehow register or check which files it has put in /var/lib/nova/instances, including _base dir, are they synced with real situation, running instances and downloaded image cache(may be unsynced too) or not exactly. May be a transaction, which check files also, when get terminate from nova-api and reporting terminate state from compute like "already terminated" . Or some "hash" of cached images like simply filesize, which checked before actually run an instance from it (can emulate by replace for example cached kernel image with any file but the same name, for sample zero size)

Actually somewhen you`ld need feature like auto clearing cache of images cause none of servers has endless memory, while at production farm very many different VMs can run on the same node during long period of time.

Thanks for your attention!

Thierry Carrez (ttx)
summary: - dead or hunged instance, sync nova-api & halted nova-compute
+ Instance files left over in case nova-compute is halted
Changed in nova:
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Thierry Carrez (ttx) wrote :

Also related to bug 809614

Revision history for this message
Derek Higgins (derekh) wrote :
Brian Waldon (bcwaldon)
Changed in nova:
assignee: nobody → Derek Higgins (higginsd)
status: Confirmed → In Progress
Derek Higgins (derekh)
Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → essex-1
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/1316
Committed: http://github.com/openstack/nova/commit/ad7fcf225e126d2a719c04019c4daa1616d2159e
Submitter: Jenkins
Branch: master

 status fixcommitted
 done

commit ad7fcf225e126d2a719c04019c4daa1616d2159e
Author: Derek Higgins <email address hidden>
Date: Fri Nov 4 00:25:34 2011 +0000

    Undefine libvirt saved instances

    Fixes bug 814561

    Adding a call to managedSaveRemove if the instance has a
    saved instance, so they are now undefined in addition to running
    instances during destroy
    With test case

    Also added myself to Authors

    Change-Id: If93e8ac6972116152f38e187bd1a61c652855814

Thierry Carrez (ttx)
Changed in nova:
milestone: essex-1 → none
Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix merged to nova (stable/diablo)

Reviewed: https://review.openstack.org/1676
Committed: http://github.com/openstack/nova/commit/27b0ff5ccd66eaaeda2ac2a013815c4d34cf9ec9
Submitter: Jenkins
Branch: stable/diablo

 tag in-stable-diablo
 done

commit 27b0ff5ccd66eaaeda2ac2a013815c4d34cf9ec9
Author: Derek Higgins <email address hidden>
Date: Fri Nov 4 00:25:34 2011 +0000

    Undefine libvirt saved instances

    Fixes bug 814561

    Adding a call to managedSaveRemove if the instance has a
    saved instance, so they are now undefined in addition to running
    instances during destroy
    With test case

    Also added myself to Authors

    (cherry picked from commit ad7fcf225e126d2a719c04019c4daa1616d2159e)

    Change-Id: I760a15d2ab135d7c3d638ca3c4358d8600582411

Thierry Carrez (ttx)
Changed in nova:
milestone: none → essex-2
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote : Please test proposed package

Hello Alexander, or anyone else affected,

Accepted nova into oneiric-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed
Thierry Carrez (ttx)
Changed in nova:
milestone: essex-2 → 2012.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.