LXC instances cannot reboot (reboot from container)

Bug #1506390 reported by Bertrand NOEL on 2015-10-15
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Unassigned

Bug Description

I have an LXC compute node. I can create LXC containers, they work fine.
When I try to reboot containers (reboot initiated from inside the container), the container goes into "SHUTOFF" status / "Shutdown" power state, and does not come back.

If I do a "nova start", the container comes back to "RUNNING", but with the following exception in logs:
----------
ERROR nova.virt.disk.api [req-63630337-923f-4994-8960-83368c6a192e admin admin] Failed to teardown container filesystem
TRACE nova.virt.disk.api Traceback (most recent call last):
TRACE nova.virt.disk.api File "/opt/stack/nova/nova/virt/disk/api.py", line 472, in teardown_container
TRACE nova.virt.disk.api run_as_root=True, attempts=3)
TRACE nova.virt.disk.api File "/opt/stack/nova/nova/utils.py", line 389, in execute
TRACE nova.virt.disk.api return RootwrapProcessHelper().execute(*cmd, **kwargs)
TRACE nova.virt.disk.api File "/opt/stack/nova/nova/utils.py", line 272, in execute
TRACE nova.virt.disk.api return processutils.execute(*cmd, **kwargs)
TRACE nova.virt.disk.api File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 275, in execute
TRACE nova.virt.disk.api cmd=sanitized_cmd)
TRACE nova.virt.disk.api ProcessExecutionError: Unexpected error while running command.
TRACE nova.virt.disk.api Command: sudo nova-rootwrap /etc/nova/rootwrap.conf losetup --detach /dev/loop0
TRACE nova.virt.disk.api Exit code: 1
TRACE nova.virt.disk.api Stdout: u''
TRACE nova.virt.disk.api Stderr: u"loop: can't delete device /dev/loop0: No such device or address\n"
TRACE nova.virt.disk.api
----------

Tested on Juno/Kilo/Liberty/master, on an Ubuntu 14.04
(note that in Juno, nova start does not even work)

Below is my Devstack recipe if needed:
---------
sudo mkdir -p /opt/stack
sudo chown $USER /opt/stack
git clone -b stable/liberty https://git.openstack.org/openstack-dev/devstack /opt/stack/devstack

cat > /opt/stack/devstack/local.conf << END
[[local|localrc]]

VIRT_DRIVER=libvirt
LIBVIRT_TYPE=lxc

disable_service heat h-api h-api-cfn h-api-cw h-eng
disable_service horizon
disable_service tempest
disable_service c-sch c-api c-vol
disable_service s-proxy s-object s-container s-account
disable_service q-svc q-agt q-dhcp q-l3 q-meta neutron
disable_service tempest

DATABASE_PASSWORD=password
RABBIT_PASSWORD=password
SERVICE_TOKEN=password
SERVICE_PASSWORD=password
ADMIN_PASSWORD=password
END

cd /opt/stack/devstack/
./stack.sh
---------

Michael Petersen (mpetason) wrote :

Is this happening with a specific image? I tested with Cirros on a base install but wasn't able to replicate the issue.

Reboot within the Instance did put the instance in a shutdown state. Nova start was able to start the instance. I wasn't able to see any errors in the logs.

Mark Doffman (mjdoffma) wrote :

Confirmed. I thought that this was an issue with an earlier version of LIbvirt (Pre 1.2.8) as seen in this bug:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/914716

However I have tried this again on ubuntu 15.04 with a more recent version of libvirt, same issue.

As with Michael, I cannot see any errors in the logs, but the instance does not reboot.

I'm not sure what the proper behavior is here. Using LXC directly a cirros image reboots just fine.

Changed in nova:
status: New → Confirmed

Oh, the error comes on CentOS 7. I mixed my tests. Very sorry about that.

So, to recap, I tried on CentOS 7 and Ubuntu 14.04. On both, rebooting my container makes it goes to SHUTOFF/Shutdown. It is not a normal behavior, right?

After that,
On CentOS, when I do a nova start, I get the "can't delete device /dev/loop0" error. But it comes back to RUNNING anyway, and I can use and then delete the container without any errors.
On Ubuntu, when I do a nova start, I don't get any errors. The container goes back to RUNNING, and I can use it. But I realized that after, I cannot delete the container. I get the exception: "libvirtError: Failed to kill process 12860: Permission denied"

Michael Petersen (mpetason) wrote :

Mark - Did you test this on a version of libvirt that was newer than 1.2.8?

There are permission issues even with libvirt 1.2.2:

Ubuntu: 14.04.3

libvirtd (libvirt) 1.2.2

/var/log/libvirt/libvirtd.log

2015-10-19 19:54:30.799+0000: 19818: warning : virLXCProcessMonitorInitNotify:615 : Cannot obtain pid NS inode for 20236: Unable to stat /proc/20236/ns/pid: Permission denied
2015-10-19 19:56:09.601+0000: 19818: warning : virLXCProcessReboot:124 : Unable to handle reboot of vm instance-00000001

2015-10-19 19:54:30.799+0000: 19818: error : virLXCProcessGetNsInode:582 : Unable to stat /proc/20236/ns/pid: Permission denied

There are known issues with Ubuntu and LXC containers related to the apparmor profile. The default installation of Devstack + Libvirt + LXC does not install the apparmor profiles, but I'm not sure if that's causing the issue or not. I was able to disable however I wasn't able to try with a newer version of libvirt yet.

Mark Doffman (mjdoffma) wrote :

Michael - I tried with 1.2.2 and 1.2.12 with the same results.

Good to know about the apparmor profiles. I'll see if I can try again with that issue resolved.

Bertrand - Can't remember now if I checked the logs before or after restarting the instance. Will make sure that I check after next time.

Changed in nova:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers