Nova hard reboot fails to mount logical volume (LVM + libvirt-lxc)

Bug #1552740 reported by Thomas Maddox
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Unassigned

Bug Description

Discovered with the experimental libvirt-lxc tempest gate job initially, but pared down to an easier test using Devstack and a node like that which is used in our CI for devstack-gate tests. Here's an etherpad with many details: https://etherpad.openstack.org/p/lxc_driver_devstack_gate.

The gist of it is there appears to be a bug where trying to hard reboot a libvirtLXC instance in nova, when using LVM storage backend, when nova goes to try and mount the LV, and it will sometimes fail with:

```
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 138, in _dispatch_and_reply
    incoming.message))
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _dispatch
    return self._do_dispatch(endpoint, method, ctxt, args)
  File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 127, in _do_dispatch
    result = func(ctxt, **new_args)
  File "/opt/stack/nova/nova/exception.py", line 110, in wrapped
    payload)
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/exception.py", line 89, in wrapped
    return f(self, context, *args, **kw)
  File "/opt/stack/nova/nova/compute/manager.py", line 359, in decorated_function
    LOG.warning(msg, e, instance=instance)
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/compute/manager.py", line 328, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/opt/stack/nova/nova/compute/manager.py", line 409, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/opt/stack/nova/nova/compute/manager.py", line 387, in decorated_function
    kwargs['instance'], e, sys.exc_info())
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/compute/manager.py", line 375, in decorated_function
    return function(self, context, *args, **kwargs)
  File "/opt/stack/nova/nova/compute/manager.py", line 3061, in reboot_instance
    self._set_instance_obj_error_state(context, instance)
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/compute/manager.py", line 3042, in reboot_instance
    bad_volumes_callback=bad_volumes_callback)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2404, in reboot
    block_device_info)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2501, in _hard_reboot
    vifs_already_plugged=True)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4904, in _create_domain_and_network
    block_device_info, disk_info):
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4814, in _lxc_disk_handler
    block_device_info, disk_info)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4764, in _create_domain_setup_lxc
    container_dir=container_dir)
  File "/opt/stack/nova/nova/virt/disk/api.py", line 428, in setup_container
    raise exception.NovaException(img.errors)
NovaException:
--
Failed to mount filesystem: Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf mount /dev/stack-volumes-default/692321ba-dd42-4c31-84af-10ca2f10324d_disk /opt/stack/data/nova/instances/692321ba-dd42-4c31-84af-10ca2f10324d/rootfs
Exit code: 32
Stdout: u''
Stderr: u'mount: /dev/mapper/stack--volumes--default-692321ba--dd42--4c31--84af--10ca2f10324d_disk already mounted or /opt/stack/data/nova/instances/692321ba-dd42-4c31-84af-10ca2f10324d/rootfs busy\n'
```

I can recreate this fairly consistently in devstack using the same form factor as nodes in our CI devstack-gate:

local.conf:
```
[[local|localrc]]
LIBVIRT_TYPE=lxc
NOVA_BACKEND=LVM
```

```
$ ./stack.sh
```
...

```
$ source openrc
$ for i in `seq 1 1 10`; do ( nova boot --image "cirros-0.3.4-x86_64-rootfs" --flavor 42 test$i & ); done
$ for i in `seq 1 1 10`; do ( nova reboot --hard test$i & ); done
```

After doing so, some of the instances should go into ERROR with the Traceback above in the compute log. The volume of instances is meant to perturb the issue more reliably. This doesn't always happen, however it has happened several times when I've just spun up one instance and tried.

I am running HEAD in Devstack when I see this problem.

Note: On this set up, nova's soft reboot is falling through to hard reboot, I believe due to this bug: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1536280.

Tags: lxc
description: updated
description: updated
summary: - Nova reboot fails to mount logical volume (LVM + libvirt-lxc)
+ Nova hard reboot fails to mount logical volume (LVM + libvirt-lxc)
description: updated
description: updated
Matt Riedemann (mriedem)
tags: added: lxc
Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Changed in nova:
assignee: nobody → Thomas Maddox (thomas-maddox)
Changed in nova:
assignee: Thomas Maddox (thomas-maddox) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.