2016-02-29 10:30:51 |
David Edery |
description |
Scenario:
----------------
1. Create a VM with SRIOV port --> the vNIC state should be UP
2. On the VM host, shutdown the PF (i.e. ifconfig ens1f1 down) --> The vNIC state should be DOWN
3. Create a snapshot of this suboptimal VM
Expected behavior:
--------------------------------
Snapshot creation process begins, after a while the snapshot succeeds, VM should be up without the vNIC (if the PF is still down)
Actual behavior:
---------------------------
Snapshot creation process begins, after a while the snapshot fails and is deleted. The VM is up without the vNIC (since the PF is still down)
Analysis:
---------------
From nova-compute.log
2016-02-29 07:21:51.578 34455 ERROR nova.compute.manager [req-00a47655-f6c8-4e53-913f-bef0c61100d0 6dc69ac7e48549e0a6f0577e870a096e 7e5da331cb6342fca3688ea4b4df01f4 - - -] [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] Instance failed to spawn
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] Traceback (most recent call last):
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2480, in _build_resource
s
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] yield resources
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2352, in _build_and_run_
instance
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] block_device_info=block_device_info)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2463, in spawn
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] block_device_info=block_device_info)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4588, in _create_domain_and_network
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] power_on=power_on)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4519, in _create_domain
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] LOG.error(err)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] six.reraise(self.type_, self.value, self.tb)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4509, in _create_domain
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] domain.createWithFlags(launch_flags)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] result = proxy_call(self._autowrap, f, *args, **kwargs)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] rv = execute(f, *args, **kwargs)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] six.reraise(c, e, tb)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] rv = meth(*args, **kwargs)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1059, in createWithFlags
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] libvirtError: internal error: Unable to configure VF 62 of PF 'ens1f1' because the PF is not online. Please change host network config to put the PF online.
After a short conversation in #openstack-nova the initial conclusion was that probably the snapshot process shouldn't fail.
Further analysis (trying to fix the issue locally):
------------------------------------------------------------------------
Looking at libvirt's code (virnetdev.c) it seems that there's no way to workaround this problem (attaching to an offline PF, though configuration of VLAN and MAC is possible):
static int
virNetDevReplaceVfConfig(const char *pflinkdev, int vf,
const virMacAddr *macaddress,
int vlanid,
const char *stateDir)
{
...
/* Assure that PF is online prior to twiddling with the VF. It
* *should* be, but if the PF isn't online the changes made to the
* VF via the PF won't take effect, yet there will be no error
* reported. In the case that it isn't online, fail and report the
* error, since setting an unconfigured interface online
* automatically turns on IPv6 autoconfig, which may not be what
* the admin expects, so we want them to explicitly enable the PF
* in the host system network config.
*/
if (virNetDevGetOnline(pflinkdev, &pfIsOnline) < 0)
goto cleanup;
if (!pfIsOnline) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Unable to configure VF %d of PF '%s' "
"because the PF is not online. Please "
"change host network config to put the "
"PF online."),
vf, pflinkdev);
goto cleanup;
}
...
(from http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virnetdev.c)
Introducing the following change to libvirt.py (the "Manually written part of python bindings for libvirt"):
--- /usr/lib64/python2.7/site-packages/libvirt.py 2016-02-29 09:20:34.734195406 +0000
+++ /usr/lib64/python2.7/site-packages/libvirt.py.patched 2016-02-29 09:20:26.068155372 +0000
@@ -527,7 +527,7 @@
into S4 state (also known as hibernation) unless you also modify the
persistent domain definition. """
ret = libvirtmod.virDomainAttachDevice(self._o, xml)
- if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
+ #if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
return ret
def attachDeviceFlags(self, xml, flags=0):
@@ -1056,7 +1056,7 @@
If the VIR_DOMAIN_START_FORCE_BOOT flag is set, then any managed save
file for this domain is discarded, and the domain boots from scratch. """
ret = libvirtmod.virDomainCreateWithFlags(self._o, flags)
- if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
+ #if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
return ret
Solved the issue and the behavior returned to be as expected.
The fix should be in the snapshot failure code (and not in the libvirt.py of course...) |
Scenario:
----------------
1. Create a VM with SRIOV port --> the vNIC state should be UP
2. On the VM host, shutdown the PF (i.e. ifconfig ens1f1 down) --> The vNIC state should be DOWN
3. Create a snapshot of this suboptimal VM
Expected behavior:
--------------------------------
Snapshot creation process begins, after a while the snapshot succeeds, VM should be up without the vNIC (if the PF is still down)
Actual behavior:
---------------------------
Snapshot creation process begins, after a while the snapshot fails and is deleted. The VM is up without the vNIC (since the PF is still down)
Analysis:
---------------
From nova-compute.log
2016-02-29 07:04:05.205 34455 WARNING nova.virt.libvirt.driver [req-3f00299f-1c82-4bb4-9283-81dc1d11383d 6dc69ac7e48549e0a6f0577e870a096e 7e5da331cb6342fca3688ea4b4df01f4 - - -] Performing standard snap
shot because direct snapshot failed: Image cd938ad4-6e0a-46f3-bc1c-d94644b3fef5 is unacceptable: direct_snapshot() is not implemented
2016-02-29 07:07:43.834 34455 ERROR oslo_messaging.rpc.dispatcher [req-3f00299f-1c82-4bb4-9283-81dc1d11383d 6dc69ac7e48549e0a6f0577e870a096e 7e5da331cb6342fca3688ea4b4df01f4 - - -] Exception during mess
age handling: internal error: Unable to configure VF 62 of PF 'ens1f1' because the PF is not online. Please change host network config to put the PF online.
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6917, in snapshot_instance
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher return self.manager.snapshot_instance(ctxt, image_id, instance)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher payload)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 341, in decorated_function
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher LOG.warning(msg, e, instance_uuid=instance_uuid)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 312, in decorated_function
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 369, in decorated_function
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info())
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 357, in decorated_function
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 417, in decorated_function
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher instance=instance)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 407, in decorated_function
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher *args, **kwargs)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3297, in snapshot_instance
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher task_states.IMAGE_SNAPSHOT)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3327, in _snapshot_instance
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher update_task_state)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1471, in snapshot
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher update_task_state)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1507, in _generic_snapshot
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher instance)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1487, in _snapshot_domain
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher self._attach_sriov_ports(context, instance, new_dom)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3092, in _attach_sriov_ports
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher dom.attachDevice(cfg.to_xml())
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher result = proxy_call(self._autowrap, f, *args, **kwargs)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher rv = execute(f, *args, **kwargs)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(c, e, tb)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher rv = meth(*args, **kwargs)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/libvirt.py", line 530, in attachDevice
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher libvirtError: internal error: Unable to configure VF 62 of PF 'ens1f1' because the PF is not online. Please change host network config to put the PF online.
After a short conversation in #openstack-nova the initial conclusion was that probably the snapshot process shouldn't fail.
NOTE: This was observed in Kilo but it doesn't seem like the exception handling in the code changed until Mitaka.
Further analysis (trying to fix the issue locally):
------------------------------------------------------------------------
Looking at libvirt's code (virnetdev.c) it seems that there's no way to workaround this problem (attaching to an offline PF, though configuration of VLAN and MAC is possible):
static int
virNetDevReplaceVfConfig(const char *pflinkdev, int vf,
const virMacAddr *macaddress,
int vlanid,
const char *stateDir)
{
...
/* Assure that PF is online prior to twiddling with the VF. It
* *should* be, but if the PF isn't online the changes made to the
* VF via the PF won't take effect, yet there will be no error
* reported. In the case that it isn't online, fail and report the
* error, since setting an unconfigured interface online
* automatically turns on IPv6 autoconfig, which may not be what
* the admin expects, so we want them to explicitly enable the PF
* in the host system network config.
*/
if (virNetDevGetOnline(pflinkdev, &pfIsOnline) < 0)
goto cleanup;
if (!pfIsOnline) {
virReportError(VIR_ERR_INTERNAL_ERROR,
_("Unable to configure VF %d of PF '%s' "
"because the PF is not online. Please "
"change host network config to put the "
"PF online."),
vf, pflinkdev);
goto cleanup;
}
...
(from http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virnetdev.c)
Introducing the following change to libvirt.py (the "Manually written part of python bindings for libvirt"):
--- /usr/lib64/python2.7/site-packages/libvirt.py 2016-02-29 09:20:34.734195406 +0000
+++ /usr/lib64/python2.7/site-packages/libvirt.py.patched 2016-02-29 09:20:26.068155372 +0000
@@ -527,7 +527,7 @@
into S4 state (also known as hibernation) unless you also modify the
persistent domain definition. """
ret = libvirtmod.virDomainAttachDevice(self._o, xml)
- if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
+ #if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)
return ret
def attachDeviceFlags(self, xml, flags=0):
@@ -1056,7 +1056,7 @@
If the VIR_DOMAIN_START_FORCE_BOOT flag is set, then any managed save
file for this domain is discarded, and the domain boots from scratch. """
ret = libvirtmod.virDomainCreateWithFlags(self._o, flags)
- if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
+ #if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
return ret
Solved the issue and the behavior returned to be as expected.
The fix should be in the snapshot failure code (and not in the libvirt.py of course...) |
|