destroy-environment fails to clear lxc containers

Bug #1307215 reported by Evan on 2014-04-13
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
juju-core
High
Unassigned
lxc (Ubuntu)
Medium
Unassigned

Bug Description

Running destroy-environment with --force happily cleans up evan-local-machine-1, but fails to remove 2 or 3. I think this is what led to bug 1303778 for me.

  % juju destroy-environment local --force
WARNING! this command will destroy the "local" environment (type: local)
This includes all machines, services, data and other resources.

Continue [y/N]? y
[sudo] password for evan:
ERROR failed to destroy lxc container: error executing "lxc-destroy": lxc_container: Error destroying rootfs for evan-local-machine-2; Destroying evan-local-machine-2 failed
ERROR error executing "lxc-destroy": lxc_container: Error destroying rootfs for evan-local-machine-2; Destroying evan-local-machine-2 failed
ERROR exit status 1

 % sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
-----------------------------------------------------
evan-local-machine-2 STOPPED - - YES
evan-local-machine-3 STOPPED - - YES
juju-precise-template STOPPED - - NO

  % juju destroy-environment local --force
WARNING! this command will destroy the "local" environment (type: local)
This includes all machines, services, data and other resources.

Continue [y/N]? y
ERROR failed to destroy lxc container: error executing "lxc-destroy": lxc_container: Error destroying rootfs for evan-local-machine-3; Destroying evan-local-machine-3 failed
ERROR error executing "lxc-destroy": lxc_container: Error destroying rootfs for evan-local-machine-3; Destroying evan-local-machine-3 failed
ERROR exit status 1

 % sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
-----------------------------------------------------
evan-local-machine-2 STOPPED - - YES
evan-local-machine-3 STOPPED - - YES
juju-precise-template STOPPED - - NO

  % juju destroy-environment local --force
WARNING! this command will destroy the "local" environment (type: local)
This includes all machines, services, data and other resources.

Continue [y/N]? y

 % sudo lxc-ls -f
NAME STATE IPV4 IPV6 AUTOSTART
-----------------------------------------------------
evan-local-machine-2 STOPPED - - YES
evan-local-machine-3 STOPPED - - YES
juju-precise-template STOPPED - - NO

Tim Penhey (thumper) wrote :

Hi Evan, Can I get you to please run and pastebin the following?

juju destroy-environment local --logging-config=golxc=TRACE;juju=DEBUG --show-log

This will give us the output of the lxc command and why it is failing.

Curtis Hovey (sinzui) on 2014-04-14
tags: added: destroy-environment local-provider lxc
Changed in juju-core:
status: New → Triaged
importance: Undecided → High
milestone: none → 1.20.0
Evan (ev) wrote :

I've been unable to reproduce this thus far, but I'll keep at it.

John A Meinel (jameinel) wrote :

Until we can reproduce this, I don't think we can address it.

Changed in juju-core:
status: Triaged → Incomplete
Curtis Hovey (sinzui) on 2014-05-01
Changed in juju-core:
milestone: 1.20.0 → none
Curtis Hovey (sinzui) on 2014-05-12
Changed in juju-core:
importance: High → Medium
Launchpad Janitor (janitor) wrote :

[Expired for juju-core because there has been no activity for 60 days.]

Changed in juju-core:
status: Incomplete → Expired
Evan (ev) wrote :
Download full text (33.6 KiB)

I've had this happen again. It looks like it lxc cannot remove the rootfs subvolume because it references other subvolumes:
http://lxr.free-electrons.com/source/fs/btrfs/ioctl.c#L1894

(.venv-ubuntu)vagrant@vagrant-ubuntu-trusty-64:/ev/bzr/uci-engine/ceph$ sudo strace -e file lxc-destroy --force --logpriority=DEBUG --name vagrant-local-machine-1
execve("/usr/bin/lxc-destroy", ["lxc-destroy", "--force", "--logpriority=DEBUG", "--name", "vagrant-local-machine-1"], [/* 16 vars */]) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/liblxc.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libapparmor.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/usr/lib/x86_64-linux-gnu/libseccomp.so.2", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libcgmanager.so.0", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libnih.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libnih-dbus.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdbus-1.so.3", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libutil.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/librt.so.1", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libpcre.so.3", O_RDONLY|O_CLOEXEC) = 3
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
statfs("/sys/fs/selinux", 0x7fffe99765f0) = -1 ENOENT (No such file or directory)
statfs("/selinux", 0x7fffe99765f0) = -1 ENOENT (No such file or directory)
open("/proc/filesystems", O_RDONLY) = 3
open("/proc/cgroups", O_RDONLY|O_CLOEXEC) = 3
stat("/sys/kernel/se...

Changed in juju-core:
status: Expired → New
Evan (ev) wrote :

Yup, that's definitely it. Deleting the subvolumes under /var/lib/lxc/vagrant-local-machine-*/rootfs/srv/disk/{current,snap_} freed up /var/lib/lxc/vagrant-local-machine-* so that lxc-destroy worked.

Evan (ev) wrote :

To further clarify, this isn't a juju bug. lxc should be smart enough to delete dependent subvolumes before it deletes the rootfs subvolume. I took a stab at this over the weekend, but ran out of time. btrfs_destroy(struct bdev *orig) is what you're after.

Serge Hallyn (serge-hallyn) wrote :

To be sure I understand, the issue is that inside your container you created btrfs subvolumes (by using btrfs containers or manually using btrfs subvolume create)?

Curtis Hovey (sinzui) on 2014-07-14
Changed in juju-core:
status: New → Triaged
importance: Medium → High
importance: High → Medium
Evan (ev) wrote :

Correct. Attached is a juju-deployer config that reproduces the issue when used with the juju local provider and /var/lib/lxc in the host backed onto btrfs. If you deploy it and run juju destroy-environment --force -y local, it should fail.

This is because Ceph sees that /srv/ceph is on btrfs and makes use of it.

ubuntu@vagrant-local-machine-1:~$ mount
/dev/sda2 on / type btrfs (rw)

So you'll end up with lots of snapshots created by Ceph inside LXC. These will be visible by running `btrfs subvolume list /var/lib/lxc` in the host:

(.venv-ubuntu)vagrant@vagrant-ubuntu-trusty-64:~$ sudo btrfs subvolume list /var/lib/lxc
ID 256 gen 9268 top level 5 path @lxc
ID 257 gen 8126 top level 256 path juju-precise-template/rootfs
ID 285 gen 8006 top level 256 path juju-trusty-template/rootfs
ID 762 gen 9208 top level 256 path vagrant-local-machine-25/rootfs
ID 763 gen 9208 top level 256 path vagrant-local-machine-26/rootfs
ID 764 gen 9208 top level 256 path vagrant-local-machine-27/rootfs
ID 925 gen 9208 top level 256 path vagrant-local-machine-27/rootfs/srv/ceph/current
ID 934 gen 9208 top level 256 path vagrant-local-machine-26/rootfs/srv/ceph/current
ID 944 gen 9208 top level 256 path vagrant-local-machine-25/rootfs/srv/ceph/current
ID 951 gen 9208 top level 256 path vagrant-local-machine-26/rootfs/srv/ceph/snap_4426
ID 954 gen 9208 top level 256 path vagrant-local-machine-26/rootfs/srv/ceph/snap_4483
ID 957 gen 9208 top level 256 path vagrant-local-machine-25/rootfs/srv/ceph/snap_4767
ID 958 gen 9208 top level 256 path vagrant-local-machine-27/rootfs/srv/ceph/snap_5753
ID 959 gen 9208 top level 256 path vagrant-local-machine-25/rootfs/srv/ceph/snap_4845
ID 960 gen 9208 top level 256 path vagrant-local-machine-27/rootfs/srv/ceph/snap_5754

Changed in lxc (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Curtis Hovey (sinzui) on 2014-08-05
tags: added: ubuntu-engineering
Changed in juju-core:
importance: Medium → High
milestone: none → next-stable
Serge Hallyn (serge-hallyn) wrote :

The fix for this is applied in lxc's git HEAD.

Changed in lxc (Ubuntu):
status: Triaged → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package lxc - 1.1.0~alpha2-0ubuntu2

---------------
lxc (1.1.0~alpha2-0ubuntu2) utopic; urgency=medium

  * Cherry-pick usptream bugfix for lxc-usernic test.
 -- Stephane Graber <email address hidden> Thu, 02 Oct 2014 15:01:56 -0400

Changed in lxc (Ubuntu):
status: Fix Committed → Fix Released
Curtis Hovey (sinzui) on 2014-10-06
Changed in juju-core:
status: Triaged → Invalid
Curtis Hovey (sinzui) on 2014-10-21
Changed in juju-core:
milestone: next-stable → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers