LXC instances cannot reboot (reboot from container)

Bug #1506390 reported by Bertrand NOEL
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
High
Unassigned

Bug Description

I have an LXC compute node. I can create LXC containers, they work fine.
When I try to reboot containers (reboot initiated from inside the container), the container goes into "SHUTOFF" status / "Shutdown" power state, and does not come back.

If I do a "nova start", the container comes back to "RUNNING", but with the following exception in logs:
----------
ERROR nova.virt.disk.api [req-63630337-923f-4994-8960-83368c6a192e admin admin] Failed to teardown container filesystem
TRACE nova.virt.disk.api Traceback (most recent call last):
TRACE nova.virt.disk.api File "/opt/stack/nova/nova/virt/disk/api.py", line 472, in teardown_container
TRACE nova.virt.disk.api run_as_root=True, attempts=3)
TRACE nova.virt.disk.api File "/opt/stack/nova/nova/utils.py", line 389, in execute
TRACE nova.virt.disk.api return RootwrapProcessHelper().execute(*cmd, **kwargs)
TRACE nova.virt.disk.api File "/opt/stack/nova/nova/utils.py", line 272, in execute
TRACE nova.virt.disk.api return processutils.execute(*cmd, **kwargs)
TRACE nova.virt.disk.api File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py", line 275, in execute
TRACE nova.virt.disk.api cmd=sanitized_cmd)
TRACE nova.virt.disk.api ProcessExecutionError: Unexpected error while running command.
TRACE nova.virt.disk.api Command: sudo nova-rootwrap /etc/nova/rootwrap.conf losetup --detach /dev/loop0
TRACE nova.virt.disk.api Exit code: 1
TRACE nova.virt.disk.api Stdout: u''
TRACE nova.virt.disk.api Stderr: u"loop: can't delete device /dev/loop0: No such device or address\n"
TRACE nova.virt.disk.api
----------

Tested on Juno/Kilo/Liberty/master, on an Ubuntu 14.04
(note that in Juno, nova start does not even work)

Below is my Devstack recipe if needed:
---------
sudo mkdir -p /opt/stack
sudo chown $USER /opt/stack
git clone -b stable/liberty https://git.openstack.org/openstack-dev/devstack /opt/stack/devstack

cat > /opt/stack/devstack/local.conf << END
[[local|localrc]]

VIRT_DRIVER=libvirt
LIBVIRT_TYPE=lxc

disable_service heat h-api h-api-cfn h-api-cw h-eng
disable_service horizon
disable_service tempest
disable_service c-sch c-api c-vol
disable_service s-proxy s-object s-container s-account
disable_service q-svc q-agt q-dhcp q-l3 q-meta neutron
disable_service tempest

DATABASE_PASSWORD=password
RABBIT_PASSWORD=password
SERVICE_TOKEN=password
SERVICE_PASSWORD=password
ADMIN_PASSWORD=password
END

cd /opt/stack/devstack/
./stack.sh
---------

Tags: lxc reboot
Revision history for this message
Michael Petersen (mpetason) wrote :

Is this happening with a specific image? I tested with Cirros on a base install but wasn't able to replicate the issue.

Reboot within the Instance did put the instance in a shutdown state. Nova start was able to start the instance. I wasn't able to see any errors in the logs.

Revision history for this message
Mark Doffman (mjdoffma) wrote :

Confirmed. I thought that this was an issue with an earlier version of LIbvirt (Pre 1.2.8) as seen in this bug:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/914716

However I have tried this again on ubuntu 15.04 with a more recent version of libvirt, same issue.

As with Michael, I cannot see any errors in the logs, but the instance does not reboot.

I'm not sure what the proper behavior is here. Using LXC directly a cirros image reboots just fine.

Changed in nova:
status: New → Confirmed
Revision history for this message
Bertrand NOEL (bertrand-noel-88) wrote :

Oh, the error comes on CentOS 7. I mixed my tests. Very sorry about that.

So, to recap, I tried on CentOS 7 and Ubuntu 14.04. On both, rebooting my container makes it goes to SHUTOFF/Shutdown. It is not a normal behavior, right?

After that,
On CentOS, when I do a nova start, I get the "can't delete device /dev/loop0" error. But it comes back to RUNNING anyway, and I can use and then delete the container without any errors.
On Ubuntu, when I do a nova start, I don't get any errors. The container goes back to RUNNING, and I can use it. But I realized that after, I cannot delete the container. I get the exception: "libvirtError: Failed to kill process 12860: Permission denied"

Revision history for this message
Michael Petersen (mpetason) wrote :

Mark - Did you test this on a version of libvirt that was newer than 1.2.8?

There are permission issues even with libvirt 1.2.2:

Ubuntu: 14.04.3

libvirtd (libvirt) 1.2.2

/var/log/libvirt/libvirtd.log

2015-10-19 19:54:30.799+0000: 19818: warning : virLXCProcessMonitorInitNotify:615 : Cannot obtain pid NS inode for 20236: Unable to stat /proc/20236/ns/pid: Permission denied
2015-10-19 19:56:09.601+0000: 19818: warning : virLXCProcessReboot:124 : Unable to handle reboot of vm instance-00000001

2015-10-19 19:54:30.799+0000: 19818: error : virLXCProcessGetNsInode:582 : Unable to stat /proc/20236/ns/pid: Permission denied

There are known issues with Ubuntu and LXC containers related to the apparmor profile. The default installation of Devstack + Libvirt + LXC does not install the apparmor profiles, but I'm not sure if that's causing the issue or not. I was able to disable however I wasn't able to try with a newer version of libvirt yet.

Revision history for this message
Mark Doffman (mjdoffma) wrote :

Michael - I tried with 1.2.2 and 1.2.12 with the same results.

Good to know about the apparmor profiles. I'll see if I can try again with that issue resolved.

Bertrand - Can't remember now if I checked the logs before or after restarting the instance. Will make sure that I check after next time.

Changed in nova:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.