Bug #971621 “nova delete lxc-instance umounts the wrong rootfs” : Series essex : Bugs : OpenStack Compute (nova)

Eric Dodemont (dodeeric) on 2012-04-02

description:	updated
description:	updated

Eric Dodemont (dodeeric) on 2012-04-03

description:	updated
description:	updated

Eric Dodemont (dodeeric) on 2012-04-06

description:	updated
description:	updated

Revision history for this message

Eric Dodemont (dodeeric) wrote on 2012-04-06:

#1

This problem happens with:

a) qcow2 root disk images (nbd block devices) ==> As described above

disk: qcow2 format

instance1 .../instance-00000001/disk --> nbd15 --> .../instance-00000001/rootfs
instance2 .../instance-00000002/disk --> nbd14 --> .../instance-00000002/rootfs
instance3 .../instance-00000003/disk --> nbd13 --> .../instance-00000003/rootfs

but also with:

b) raw root disk images (loop block devices)

nova.conf: use_cow_images=False

disk: raw format

instance1 .../instance-00000001/disk --> loop0 --> .../instance-00000001/rootfs
instance2 .../instance-00000002/disk --> loop1 --> .../instance-00000002/rootfs
instance3 .../instance-00000003/disk --> loop2 --> .../instance-00000003/rootfs

Revision history for this message

Eric Dodemont (dodeeric) wrote on 2012-04-07:

#2

Traces:

nova delete i1 (uuid: 9b1... / veth0 / loop0 / instance-0000000a)

Name IP veth loop rootfs uuid
--------------------------------------------
i1 4.3 0 0 i-a 9b1...
i2 4.4 1 1 i-b af2...

NOK: loop1 is umounted/disconneced!!!

AMQP: req-d29...

user_uuid: cad... (dodeeric)
project_uuid: c1a... (lc2)

-----

OK (i-a / instance-0000000a):

2012-04-07 10:26:52 INFO nova.virt.libvirt.connection [req-d29c2ab4-cd57-4121-93c2-27e9d56cf1fb cadc85c18c6
a4aee928c19653f48dd45 c1ab5bc8097b48278ef41db02ecf82eb] [instance: 9b1533e4-f973-43ef-a1b5-96ce52ae5db1] Deleting instance files /var/lib/nova/instances/instance-0000000a

NOK (loop1):

2012-04-07 10:26:52 DEBUG nova.utils [req-d29c2ab4-cd57-4121-93c2-27e9d56cf1fb cadc85c18c6a4aee928c19653f48
dd45 c1ab5bc8097b48278ef41db02ecf82eb] Running cmd (subprocess): sudo nova-rootwrap umount /dev/loop1 from
(pid=31501) execute /usr/lib/python2.7/dist-packages/nova/utils.py:219

2012-04-07 10:26:52 DEBUG nova.utils [req-d29c2ab4-cd57-4121-93c2-27e9d56cf1fb cadc85c18c6a4aee928c19653f48
dd45 c1ab5bc8097b48278ef41db02ecf82eb] Running cmd (subprocess): sudo nova-rootwrap losetup --detach /dev/l
oop1 from (pid=31501) execute /usr/lib/python2.7/dist-packages/nova/utils.py:219

Revision history for this message

Eric Dodemont (dodeeric) wrote on 2012-04-07:

#3

To summarize the bug: when you delete a LXC instance, it always umount the rootfs of the latest LXC instance!

Example:

We have three running LXC instances:

i1 ../instance-00000001/rootfs
i2 ../instance-00000002/rootfs
i3 ../instance-00000003/rootfs

If you run:

$ nova delete i3 ==> will umount ../instance-00000003/rootfs ==> OK

then:

$ nova delete i2 ==> will umount ../instance-00000003/rootfs ==> NOK

then:

$ nova delete i1 ==> will umount ../instance-00000003/rootfs ==> NOK

Russell Bryant (russellb) on 2012-04-21

Changed in nova:
status:	New → Confirmed
importance:	Undecided → High

Thierry Carrez (ttx) on 2012-06-20

tags:

added: lxc

Muharem Hrnjadovic (al-maisan) on 2012-07-11

Changed in nova:
assignee:	nobody → Muharem Hrnjadovic (al-maisan)
status:	Confirmed → In Progress

Revision history for this message

Chuck Short (zulcss) wrote on 2012-07-12:

#4

This should be an issue once the instances have been converted to uuids in nova/virt/libvirt/driver.py

Revision history for this message

Muharem Hrnjadovic (al-maisan) wrote on 2012-07-12: Re: [Bug 971621] Re: nova delete lxc-instance umounts the wrong rootfs

#5

On 07/12/2012 03:15 PM, Chuck Short wrote:
> This should be an issue once the instances have been converted to uuids
> in nova/virt/libvirt/driver.py
Thanks for the hint, Chuck! Will look that way.

Muharem Hrnjadovic (al-maisan) on 2012-07-24

Changed in nova:
assignee:	Muharem Hrnjadovic (al-maisan) → nobody
status:	In Progress → Confirmed

Pádraig Brady (p-draigbrady) on 2012-07-27

Changed in nova:
assignee:	nobody → Pádraig Brady (p-draigbrady)

Revision history for this message

Pádraig Brady (p-draigbrady) wrote on 2012-07-31:

#6

Just getting to this now.

Hmm, this is quite awkward.
There are 3 cases to consider for correctly cleaning up the LXC mount dirs

1. Multiple LXC instances (the orig bug)
virt/driver.py stores a global object tracking the mount,
and so will always clean the last LXC instance created.

Now you could let the operating system maintain state and just
`umount /the/lxc/instance/mount/dir`.
Unfortunately a simple umount wont suffice because
depending on the mount method we may need to
`fuesrmount -u` or cleaup nbd devices etc.
So we'd need to store some state as to the
type of mount we performed.
Now maybe things could be arranged so that `umount ...` always works,
though that seems invasive and hacky at first glance.

2. Now because these are long lived mounts, nova could be restarted,
requiring state info for the umount to be maintained outside of nova.
We'd have to be careful to create this atomically, i.e. as a transaction
around the mount operation.

3. If the system is restarted, then the mounts would be too,
and so the persisted mount information should be ignored.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-01: Fix proposed to nova (master)

#7

Fix proposed to branch: master
Review: https://review.openstack.org/10655

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-04: Fix merged to nova (master)

#8

Reviewed: https://review.openstack.org/10655
Committed: http://github.com/openstack/nova/commit/a5184d5dbf67630dac3abb69b1678b60807cfce7
Submitter: Jenkins
Branch: master

commit a5184d5dbf67630dac3abb69b1678b60807cfce7
Author: Pádraig Brady <email address hidden>
Date: Wed Aug 1 14:26:54 2012 +0100

fix unmounting of LXC containers

There were two issues here.

1. There was a global object stored for all instances,
thus the last mounted instance was always unmounted.

2. Even if there was only a single LXC instance in use,
the global object would be lost on restart of Nova.

    Therefore we reset the internal state for the mount object,
    by passing in the mount point to destroy_container(),
    and querying the device in use for that mount point.

Fixes bug: 971621
Change-Id: I5442442f00d93f5e8b82f492d62918419db5cd3b

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-07: Fix proposed to nova (stable/essex)

#9

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/10962

Pádraig Brady (p-draigbrady) on 2012-08-07

Changed in nova:
milestone:	none → folsom-3

Thierry Carrez (ttx) on 2012-08-16

Changed in nova:
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2012-08-21: Fix merged to nova (stable/essex)

#10

Reviewed: https://review.openstack.org/10962
Committed: http://github.com/openstack/nova/commit/272b98d718a68f5f714543ed2948d49ffe052ca5
Submitter: Jenkins
Branch: stable/essex

commit 272b98d718a68f5f714543ed2948d49ffe052ca5
Author: Pádraig Brady <email address hidden>
Date: Tue Aug 7 15:17:08 2012 +0100

fix unmounting of LXC containers

There were two issues here.

1. There was a global object stored for all instances,
thus the last mounted instance was always unmounted.

2. Even if there was only a single LXC instance in use,
the global object would be lost on restart of Nova.

    Therefore we reset the internal state for the mount object,
    by passing in the mount point to destroy_container(),
    and querying the device in use for that mount point.

    Fixes bug: 971621
    Change-Id: I5442442f00d93f5e8b82f492d62918419db5cd3b
    Cherry-picked: a5184d5dbf67630dac3abb69b1678b60807cfce7

Dave Walker (davewalker) on 2012-08-24

Changed in nova (Ubuntu):
status:	New → Fix Released
Changed in nova (Ubuntu Precise):
status:	New → Confirmed

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-08-30: Verification report.

#11

Please find the attached test log from the Ubuntu Server Team's CI infrastructure. As part of the verification process for this bug, Nova has been deployed and configured across multiple nodes using precise-proposed as an installation source. After successful bring-up and configuration of the cluster, a number of exercises and smoke tests have be invoked to ensure the updated package did not introduce any regressions. A number of test iterations were carried out to catch any possible transient errors.

Please Note the list of installed packages at the top and bottom of the report.

For records of upstream test coverage of this update, please see the Jenkins links in the comments of the relevant upstream code-review(s):

Trunk review: https://review.openstack.org/10655
Stable review: https://review.openstack.org/10962

As per the provisional Micro Release Exception granted to this package by the Technical Board, we hope this contributes toward verification of this update.

Revision history for this message

Adam Gandelman (gandelman-a) wrote on 2012-08-30:

#12

2012.1.3+stable-20120827-4d2a4afe-0ubuntu1.log Edit (82.0 KiB, text/plain)

Test coverage log.

tags:

added: verification-done

Revision history for this message

Launchpad Janitor (janitor) wrote on 2012-09-03:

#13

Download full text (5.4 KiB)

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

  * New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
  ...

This bug was fixed in the package nova - 2012.1.3+stable-20120827-4d2a4afe-0ubuntu1

---------------
nova (2012.1.3+stable-20120827-4d2a4afe-0ubuntu1) precise-proposed; urgency=low

* New upstream snapshot, fixes FTBFS in -proposed. (LP: #1041120)
  * Resynchronize with stable/essex (4d2a4afe):
    - [5d63601] Inappropriate exception handling on kvm live/block migration
      (LP: #917615)
    - [ae280ca] Deleted floating ips can cause instance delete to fail
      (LP: #1038266)

nova (2012.1.3+stable-20120824-86fb7362-0ubuntu1) precise-proposed; urgency=low

* New upstream snapshot. (LP: #1041120)
  * Dropped, superseded by new snapshot:
    - debian/patches/CVE-2012-3447.patch: [d9577ce]
    - debian/patches/CVE-2012-3371.patch: [25f5bd3]
    - debian/patches/CVE-2012-3360+3361.patch: [b0feaff]
  * Resynchronize with stable/essex (86fb7362):
    - [86fb736] Libvirt driver reports incorrect error when volume-detach fails
      (LP: #1029463)
    - [272b98d] nova delete lxc-instance umounts the wrong rootfs (LP: #971621)
    - [09217ab] Block storage connections are NOT restored on system reboot
      (LP: #1036902)
    - [d9577ce] CVE-2012-3361 not fully addressed (LP: #1031311)
    - [e8ef050] pycrypto is unused and the existing code is potentially insecure
      to use (LP: #1033178)
    - [3b4ac31] cannot umount guestfs  (LP: #1013689)
    - [f8255f3] qpid_heartbeat setting in ineffective (LP: #1030430)
    - [413c641] Deallocation of fixed IP occurs before security group refresh
      leading to potential security issue in error / race conditions
      (LP: #1021352)
    - [219c5ca] Race condition in network/deallocate_for_instance() leads to
      security issue (LP: #1021340)
    - [f2bc403] cleanup_file_locks does not remove stale sentinel files
      (LP: #1018586)
    - [4c7d671] Deleting Flavor currently in use by instance creates error
      (LP: #994935)
    - [7e88e39] nova testsuite errors on newer versions of python-boto (e.g.
      2.5.2) (LP: #1027984)
    - [80d3026] NoMoreFloatingIps: Zero floating ips available after repeatedly
      creating and destroying instances over time (LP: #1017418)
    - [4d74631] Launching with source groups under load produces lazy load error
      (LP: #1018721)
    - [08e5128] API 'v1.1/{tenant_id}/os-hosts' does not return a list of hosts
      (LP: #1014925)
    - [801b94a] Restarting nova-compute removes ip packet filters (LP: #1027105)
    - [f6d1f55] instance live migration should create virtual_size disk image
      (LP: #977007)
    - [4b89b4f] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [6e873bc] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [7b215ed] Use default qemu-img cluster size in libvirt connection driver
    - [d3a87a2] Listing flavors with marker set returns 400 (LP: #956096)
    - [cf6a85a] nova-rootwrap hardcodes paths instead of using
      /sbin:/usr/sbin:/usr/bin:/bin (LP: #1013147)
    - [2efc87c] affinity filters don't work if scheduler_hints is None
      (LP: #1007573)
    - [48e5f46] metadata injection is broken in xen (LP: #1022036)
    - [25f5bd3] scheduler hang (DOS) possible with
      DifferentHostFilter/SameHostFilter  (LP: #1017795)
    - [1c1b858] cannot umount guestfs  (LP: #1013689)
    - [835ba4f] not able to get host total memory in xen with libvirt
      (LP: #1004298)
    - [00e5104] Call to network_get_all_by_uuids missing 'db' (LP: #986922)
    - [4c49df7] [nova][volumes] Exceeding volumes, gigabytes and floating_ips
      quotas returns general uninformative HTTP 500 error (LP: #1021373)
    - [19631f3] [nova][volumes] Exceeding volumes quotas logs
      "VolumeSizeTooLarge" instead of "VolumeLimitExceeded"  (LP: #1020634)
    - [b0feaff] Remote arbitrary file corruption / creation flaw via injected
      files (LP: #1015531)
    - [3cb6e57] NoMoreFixedIps: Zero fixed ips available. Nova seems leaking
      them. (LP: #1014769)
    - [5d8431b] ram_allocation_ratio does not work (LP: #1016273)
    - [410060f] test_get_console_output_file requires sudo NOPASSWD
      (LP: #992805)
    - [33c2575] Stop/start a KVM instance with volumes attached produces an
      error state (LP: #1013782)
    - [6c01c01] Backport tox settings to unbreak jenkins jobs.
    - [344125f] Set defaultbranch in .gitreview to stable/essex
    - [9b789be] floating ips are not disassociated from instances on deletion
      (LP: #997763)
    - [d89c2f3] qpid timeout causing compute service to crash (LP: #999698)
    - [caae0e9] floating ips do not display in 'nova list' after association to
      instance (LP: #939122)
    - [1dc9f19] impl_qpid doesn't ACK messages (LP: #1012374)
    - [bc621bc] Restarting nova-network removes ip packet filters
      (LP: #1000853)
    - [7870157] Add caching to openstack.common.cfg
    - [27133ee] Firewall rules from nova-compute are not refreshed after host
      reboot (LP: #985162)
    - [3ee026e] Source group based security group rule without protocol and port
      causes failures (LP: #1010514)
    - [f0a9f47] [SRU] dns_domains table mysql charset is 'latin1'. Should be
      'utf8' (LP: #993663)
    - [cc8fd97] euca-describe-keypair NonExistent returns 200 (LP: #1006664)
    - [9f9e9da] Security groups fail to be set correctly if incorrect case is
      used for protocol specification (LP: #985184)
 -- Adam Gandelman <adamg@canonical.com>   Mon, 27 Aug 2012 14:50:40 -0700

Changed in nova (Ubuntu Precise):
status:	Confirmed → Fix Released

Revision history for this message

Clint Byrum (clint-fewbar) wrote on 2012-09-03: Update Released

#14

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Thierry Carrez (ttx) on 2012-09-27

Changed in nova:
milestone:	folsom-3 → 2012.2

OpenStack Compute (nova)

nova delete lxc-instance umounts the wrong rootfs

Bug Description

Related branches

CVE References

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
OpenStack Compute (nova)	Fix Released	High	Pádraig Brady	OpenStack Compute (nova) 2012.2 "folsom"
Essex	Fix Released	High	Pádraig Brady	OpenStack Compute (nova) 2012.1.3
nova (Ubuntu)	Fix Released	Undecided	Unassigned
Precise	Fix Released	Undecided	Unassigned