"Cannot find device" during _create_veth_pair

Bug #1531212 reported by Alexis Lee on 2016-01-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned

Bug Description

While running Kilo in a scale test, my colleague observed a stack trace on a compute host which boils down to:

    Cannot find device "qvbcdc5cf8d-ad"

Checking the log, I found:

    CMD "sudo nova-rootwrap /opt/stack/service/nova-compute/etc/nova/rootwrap.conf ip link add qvb59fd2beb-0c type veth peer name qvo59fd2beb-0c" returned: 0 in 0.091s
    CMD "sudo nova-rootwrap /opt/stack/service/nova-compute/etc/nova/rootwrap.conf ip link set qvb59fd2beb-0c up" returned: 1 in 0.092s

There was also a stack trace pointing to line 1357 of nova/network/linux_net.py, which is in _create_veth_pair. The code here is very simple: it deletes the devices if they exist, runs `ip link add ...` then `ip link set ...`. The code has not changed since Kilo.

So it seems like either `ip link add ...` should have returned nonzero and didn't; or another thread (maybe of another process) managed to delete the devices in between the two commands. Neither seem particularly likely, 'ip' is presumably pretty robust and the device names are randomly generated so a collision seems curious.

Full stack trace:

2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/openstack/common/threadgroup.py", line 145, in wait
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup x.wait()
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/openstack/common/threadgroup.py", line 47, in wait
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "eventlet/greenthread.py", line 175, in wait
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "eventlet/event.py", line 121, in wait
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "eventlet/hubs/hub.py", line 294, in switch
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "eventlet/greenthread.py", line 214, in main
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/openstack/common/service.py", line 502, in run_service
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup service.start()
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/service.py", line 164, in start
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup self.manager.init_host()
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/compute/manager.py", line 1298, in init_host
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup self._init_instance(context, instance)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/compute/manager.py", line 1133, in _init_instance
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup self.driver.plug_vifs(instance, net_info)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/virt/libvirt/driver.py", line 604, in plug_vifs
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup self.vif_driver.plug(instance, vif)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/virt/libvirt/vif.py", line 609, in plug
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup func(instance, vif)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/virt/libvirt/vif.py", line 447, in plug_ovs
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup self.plug_ovs_hybrid(instance, vif)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/virt/libvirt/vif.py", line 443, in plug_ovs_hybrid
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup self._plug_bridge_with_port(instance, vif, port='ovs')
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/virt/libvirt/vif.py", line 424, in _plug_bridge_with_port
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup linux_net._create_veth_pair(v1_name, v2_name)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/network/linux_net.py", line 1352, in _create_veth_pair
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup utils.execute('ip', 'link', 'set', dev, 'up', run_as_root=True)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "nova/utils.py", line 207, in execute
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup return processutils.execute(*cmd, **kwargs)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup File "oslo_concurrency/processutils.py", line 266, in execute
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup cmd=sanitized_cmd)
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup ProcessExecutionError: Unexpected error while running command.
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup Command: sudo nova-rootwrap /etc/nova/rootwrap.conf ip link set qvb59fd2beb-0c up
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup Exit code: 1
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup Stdout: u''
2015-12-13 18:34:58.876 1069 TRACE nova.openstack.common.threadgroup Stderr: u'Cannot find device "qvb59fd2beb-0c"\n'

Alexis Lee (alexisl) wrote :

Running this all night yielded no problems:

while true; do
sudo ip link add aaa type veth peer name bbb
sudo ip link set aaa up
sudo ip link delete aaa
sudo ip link delete bbb > /dev/null 2>&1
echo -n "."
done

Fix proposed to branch: master
Review: https://review.openstack.org/264146

Changed in nova:
assignee: nobody → Alexis Lee (alexisl)
status: New → In Progress
tags: added: kilo-backport-potential liberty-backport-potential
Alexis Lee (alexisl) on 2016-08-31
Changed in nova:
assignee: Alexis Lee (alexisl) → nobody
assignee: nobody → Alexis Lee (alexisl)

Change abandoned by Alexis Lee (<email address hidden>) on branch: master
Review: https://review.openstack.org/264146

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → New
assignee: Alexis Lee (alexisl) → nobody
Sean Dague (sdague) wrote :

Automatically discovered version kilo in description. If this is incorrect, please update the description to include 'nova version: ...'

tags: added: openstack-version.kilo
Sean Dague (sdague) wrote :

This is a kilo era bug from a long time ago. Marking as Invalid. Please reopen if this is seen again.

Changed in nova:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers