Activity log for bug #1551154

Date Who What changed Old value New value Message
2016-02-29 10:09:04 David Edery bug added bug
2016-02-29 10:09:55 David Edery nova: assignee David Edery (david-edery)
2016-02-29 10:30:51 David Edery description Scenario: ---------------- 1. Create a VM with SRIOV port --> the vNIC state should be UP 2. On the VM host, shutdown the PF (i.e. ifconfig ens1f1 down) --> The vNIC state should be DOWN 3. Create a snapshot of this suboptimal VM Expected behavior: -------------------------------- Snapshot creation process begins, after a while the snapshot succeeds, VM should be up without the vNIC (if the PF is still down) Actual behavior: --------------------------- Snapshot creation process begins, after a while the snapshot fails and is deleted. The VM is up without the vNIC (since the PF is still down) Analysis: --------------- From nova-compute.log 2016-02-29 07:21:51.578 34455 ERROR nova.compute.manager [req-00a47655-f6c8-4e53-913f-bef0c61100d0 6dc69ac7e48549e0a6f0577e870a096e 7e5da331cb6342fca3688ea4b4df01f4 - - -] [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] Instance failed to spawn 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] Traceback (most recent call last): 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2480, in _build_resource s 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] yield resources 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2352, in _build_and_run_ instance 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] block_device_info=block_device_info) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 2463, in spawn 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] block_device_info=block_device_info) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4588, in _create_domain_and_network 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] power_on=power_on) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4519, in _create_domain 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] LOG.error(err) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] six.reraise(self.type_, self.value, self.tb) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 4509, in _create_domain 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] domain.createWithFlags(launch_flags) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] result = proxy_call(self._autowrap, f, *args, **kwargs) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] rv = execute(f, *args, **kwargs) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] six.reraise(c, e, tb) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] rv = meth(*args, **kwargs) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1059, in createWithFlags 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) 2016-02-29 07:21:51.578 34455 TRACE nova.compute.manager [instance: 14a17ed9-8cbd-44f0-9cf6-a93b71de474b] libvirtError: internal error: Unable to configure VF 62 of PF 'ens1f1' because the PF is not online. Please change host network config to put the PF online. After a short conversation in #openstack-nova the initial conclusion was that probably the snapshot process shouldn't fail. Further analysis (trying to fix the issue locally): ------------------------------------------------------------------------ Looking at libvirt's code (virnetdev.c) it seems that there's no way to workaround this problem (attaching to an offline PF, though configuration of VLAN and MAC is possible): static int virNetDevReplaceVfConfig(const char *pflinkdev, int vf, const virMacAddr *macaddress, int vlanid, const char *stateDir) { ... /* Assure that PF is online prior to twiddling with the VF. It * *should* be, but if the PF isn't online the changes made to the * VF via the PF won't take effect, yet there will be no error * reported. In the case that it isn't online, fail and report the * error, since setting an unconfigured interface online * automatically turns on IPv6 autoconfig, which may not be what * the admin expects, so we want them to explicitly enable the PF * in the host system network config. */ if (virNetDevGetOnline(pflinkdev, &pfIsOnline) < 0) goto cleanup; if (!pfIsOnline) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Unable to configure VF %d of PF '%s' " "because the PF is not online. Please " "change host network config to put the " "PF online."), vf, pflinkdev); goto cleanup; } ... (from http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virnetdev.c) Introducing the following change to libvirt.py (the "Manually written part of python bindings for libvirt"): --- /usr/lib64/python2.7/site-packages/libvirt.py 2016-02-29 09:20:34.734195406 +0000 +++ /usr/lib64/python2.7/site-packages/libvirt.py.patched 2016-02-29 09:20:26.068155372 +0000 @@ -527,7 +527,7 @@ into S4 state (also known as hibernation) unless you also modify the persistent domain definition. """ ret = libvirtmod.virDomainAttachDevice(self._o, xml) - if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) + #if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) return ret def attachDeviceFlags(self, xml, flags=0): @@ -1056,7 +1056,7 @@ If the VIR_DOMAIN_START_FORCE_BOOT flag is set, then any managed save file for this domain is discarded, and the domain boots from scratch. """ ret = libvirtmod.virDomainCreateWithFlags(self._o, flags) - if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) + #if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) return ret Solved the issue and the behavior returned to be as expected. The fix should be in the snapshot failure code (and not in the libvirt.py of course...) Scenario: ---------------- 1. Create a VM with SRIOV port --> the vNIC state should be UP 2. On the VM host, shutdown the PF (i.e. ifconfig ens1f1 down) --> The vNIC state should be DOWN 3. Create a snapshot of this suboptimal VM Expected behavior: -------------------------------- Snapshot creation process begins, after a while the snapshot succeeds, VM should be up without the vNIC (if the PF is still down) Actual behavior: --------------------------- Snapshot creation process begins, after a while the snapshot fails and is deleted. The VM is up without the vNIC (since the PF is still down) Analysis: --------------- From nova-compute.log 2016-02-29 07:04:05.205 34455 WARNING nova.virt.libvirt.driver [req-3f00299f-1c82-4bb4-9283-81dc1d11383d 6dc69ac7e48549e0a6f0577e870a096e 7e5da331cb6342fca3688ea4b4df01f4 - - -] Performing standard snap shot because direct snapshot failed: Image cd938ad4-6e0a-46f3-bc1c-d94644b3fef5 is unacceptable: direct_snapshot() is not implemented 2016-02-29 07:07:43.834 34455 ERROR oslo_messaging.rpc.dispatcher [req-3f00299f-1c82-4bb4-9283-81dc1d11383d 6dc69ac7e48549e0a6f0577e870a096e 7e5da331cb6342fca3688ea4b4df01f4 - - -] Exception during mess age handling: internal error: Unable to configure VF 62 of PF 'ens1f1' because the PF is not online. Please change host network config to put the PF online. 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last): 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher executor_callback)) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher executor_callback) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6917, in snapshot_instance 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher return self.manager.snapshot_instance(ctxt, image_id, instance) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 88, in wrapped 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher payload) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/exception.py", line 71, in wrapped 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 341, in decorated_function 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher LOG.warning(msg, e, instance_uuid=instance_uuid) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 312, in decorated_function 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 369, in decorated_function 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher kwargs['instance'], e, sys.exc_info()) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 357, in decorated_function 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 417, in decorated_function 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher instance=instance) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in __exit__ 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 407, in decorated_function 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher *args, **kwargs) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3297, in snapshot_instance 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher task_states.IMAGE_SNAPSHOT) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 3327, in _snapshot_instance 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher update_task_state) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1471, in snapshot 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher update_task_state) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1507, in _generic_snapshot 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher instance) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 1487, in _snapshot_domain 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher self._attach_sriov_ports(context, instance, new_dom) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3092, in _attach_sriov_ports 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher dom.attachDevice(cfg.to_xml()) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 183, in doit 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher result = proxy_call(self._autowrap, f, *args, **kwargs) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 141, in proxy_call 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher rv = execute(f, *args, **kwargs) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 122, in execute 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher six.reraise(c, e, tb) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 80, in tworker 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher rv = meth(*args, **kwargs) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib64/python2.7/site-packages/libvirt.py", line 530, in attachDevice 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) 2016-02-29 07:07:43.834 34455 TRACE oslo_messaging.rpc.dispatcher libvirtError: internal error: Unable to configure VF 62 of PF 'ens1f1' because the PF is not online. Please change host network config to put the PF online. After a short conversation in #openstack-nova the initial conclusion was that probably the snapshot process shouldn't fail. NOTE: This was observed in Kilo but it doesn't seem like the exception handling in the code changed until Mitaka. Further analysis (trying to fix the issue locally): ------------------------------------------------------------------------ Looking at libvirt's code (virnetdev.c) it seems that there's no way to workaround this problem (attaching to an offline PF, though configuration of VLAN and MAC is possible): static int virNetDevReplaceVfConfig(const char *pflinkdev, int vf,                          const virMacAddr *macaddress,                          int vlanid,                          const char *stateDir) { ...     /* Assure that PF is online prior to twiddling with the VF. It      * *should* be, but if the PF isn't online the changes made to the      * VF via the PF won't take effect, yet there will be no error      * reported. In the case that it isn't online, fail and report the      * error, since setting an unconfigured interface online      * automatically turns on IPv6 autoconfig, which may not be what      * the admin expects, so we want them to explicitly enable the PF      * in the host system network config.      */     if (virNetDevGetOnline(pflinkdev, &pfIsOnline) < 0)        goto cleanup;     if (!pfIsOnline) {         virReportError(VIR_ERR_INTERNAL_ERROR,                        _("Unable to configure VF %d of PF '%s' "                          "because the PF is not online. Please "                          "change host network config to put the "                          "PF online."),                        vf, pflinkdev);         goto cleanup;     } ... (from http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/util/virnetdev.c) Introducing the following change to libvirt.py (the "Manually written part of python bindings for libvirt"): --- /usr/lib64/python2.7/site-packages/libvirt.py 2016-02-29 09:20:34.734195406 +0000 +++ /usr/lib64/python2.7/site-packages/libvirt.py.patched 2016-02-29 09:20:26.068155372 +0000 @@ -527,7 +527,7 @@          into S4 state (also known as hibernation) unless you also modify the          persistent domain definition. """          ret = libvirtmod.virDomainAttachDevice(self._o, xml) - if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self) + #if ret == -1: raise libvirtError ('virDomainAttachDevice() failed', dom=self)          return ret      def attachDeviceFlags(self, xml, flags=0): @@ -1056,7 +1056,7 @@          If the VIR_DOMAIN_START_FORCE_BOOT flag is set, then any managed save          file for this domain is discarded, and the domain boots from scratch. """          ret = libvirtmod.virDomainCreateWithFlags(self._o, flags) - if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) + #if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)          return ret Solved the issue and the behavior returned to be as expected. The fix should be in the snapshot failure code (and not in the libvirt.py of course...)
2016-03-02 23:47:46 OpenStack Infra nova: status New In Progress
2016-03-23 10:40:35 Roman Podoliaka nova: importance Undecided Medium
2016-03-23 10:42:35 Roman Podoliaka tags compute libvirt
2016-04-18 21:08:42 Matt Riedemann tags compute libvirt compute libvirt pci snapshot sriov
2016-07-27 12:14:06 Kashyap Chamarthy bug added subscriber Kashyap Chamarthy
2017-06-27 19:25:15 Sean Dague nova: status In Progress New
2017-06-27 19:25:19 Sean Dague nova: assignee David Edery (david-edery)
2017-07-25 14:17:12 Sean Dague nova: status New Incomplete
2017-09-24 04:17:36 Launchpad Janitor nova: status Incomplete Expired