SR-IOV error IndexError: pop from empty list

Bug #1795064 reported by Satish Patel on 2018-09-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Matt Riedemann
Ocata
Medium
Unassigned
Pike
Medium
Elod Illes
Queens
Medium
Elod Illes
Rocky
Medium
Matt Riedemann

Bug Description

I am building SR-IOV support in compute node on Queens i have following NIC card and VF enabled

[root@ostack-compute-63 ~]# lspci -nn | grep -i eth
03:00.0 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet [14e4:168e] (rev 10)
03:00.1 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet [14e4:168e] (rev 10)
03:01.0 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
03:01.1 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
03:01.2 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
03:01.3 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
03:01.4 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
03:01.5 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]
03:01.6 Ethernet controller [0200]: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function [14e4:16af]

I have setup everything according official documents and so far everything looks good.

I have created neutron-port and when i trying to launch instance i got following error on compute node.

2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager [req-095c3f53-a558-4178-84ee-cf79bf7f3c7c eebe97b4bc714b8f814af8a44d08c2a4 2927a06cf30f4f7e938fdda2cc05aed2 - default default] Instance failed network setup after 1 attempt(s): IndexError: pop from empty list
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager Traceback (most recent call last):
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/nova/compute/manager.py", line 1398, in _allocate_network_async
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager bind_host_id=bind_host_id)
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 954, in allocate_for_instance
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager bind_host_id, available_macs, requested_ports_dict)
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 1087, in _update_ports_for_instance
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager vif.destroy()
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager self.force_reraise()
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager six.reraise(self.type_, self.value, self.tb)
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 1042, in _update_ports_for_instance
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager bind_host_id=bind_host_id)
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 1192, in _populate_neutron_extension_values
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager port_req_body)
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/nova/network/neutronv2/api.py", line 1138, in _populate_neutron_binding_profile
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager instance, pci_request_id).pop()
2018-09-28 14:41:53.584 11957 ERROR nova.compute.manager IndexError: pop from empty list

------------- also every 60 second i am getting following error ---------------

2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager rv = execute(f, *args, **kwargs)
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager six.reraise(c, e, tb)
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager rv = meth(*args, **kwargs)
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager File "/openstack/venvs/nova-17.0.8/lib/python2.7/site-packages/libvirt.py", line 4232, in nodeDeviceLookupByName
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager if ret is None:raise libvirtError('virNodeDeviceLookupByName() failed', conn=self)
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager libvirtError: Node device not found: no node device with matching name 'net_enp3s1f4_00_00_00_00_00_00'
2018-09-28 16:22:30.646 28663 ERROR nova.compute.manager

also i was reading this but it didn't help: http://lists.openstack.org/pipermail/openstack/2018-January/045982.html

Satish Patel (satish-txt) wrote :

This is what i found, I was running kernel 4.18.9 as soon as i have downgrade kernel to 3.10.0-862.11.6.el7.x86_64 it resolved my issue.

Can someone explain what is going on here?

[root@ostack-compute-63 ~]# libvirtd -V
libvirtd (libvirt) 3.9.0

Matt Riedemann (mriedem) wrote :

Well the code itself is clearly fragile because it's blindly pop'ing a result from what could be an empty list:

def get_instance_pci_devs(inst, request_id=None):
    """Get the devices allocated to one or all requests for an instance.

    - For generic PCI request, the request id is None.
    - For sr-iov networking, the request id is a valid uuid
    - There are a couple of cases where all the PCI devices allocated to an
      instance need to be returned. Refer to libvirt driver that handles
      soft_reboot and hard_boot of 'xen' instances.
    """
    pci_devices = inst.pci_devices
    if pci_devices is None:
        return []
    return [device for device in pci_devices if
                   device.request_id == request_id or request_id == 'all']

tags: added: neutron pci sr-iov

Fix proposed to branch: master
Review: https://review.openstack.org/607650

Changed in nova:
assignee: nobody → Matt Riedemann (mriedem)
status: New → In Progress
Matt Riedemann (mriedem) wrote :

This has been around since Juno: https://review.openstack.org/#/c/98828/

Changed in nova:
importance: Undecided → Medium
Matt Riedemann (mriedem) wrote :

If we can sort out why the devices weren't showing up properly on the host (was the pci passthrough whitelist configuration correct? Or is it a known issue with the version of the kernel you were using with the types of devices?) then we could document something as a known issue...

Satish Patel (satish-txt) wrote :

Matt,

This is what it look like.

pci_passthrough_whitelist = "{ "physical_network":"vlan", "devname":"eno2" }"

Reviewed: https://review.openstack.org/607650
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=035708c37d587e4c5ede7fe80270bdbff98016ac
Submitter: Zuul
Branch: master

commit 035708c37d587e4c5ede7fe80270bdbff98016ac
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 3 12:54:53 2018 -0400

    Handle IndexError in _populate_neutron_binding_profile

    This fixes the code that was blindly pop'ing an entry
    from an empty list of PCI devices claimed by the instance.
    It's not exactly clear how we can get into this situation,
    presumably there was a failure in the actual PCI device
    claim logic in the ResourceTracker - maybe related to the
    configured PCI passthrough whitelist. Regardless, we should
    handle the empty PCI device list in this method and raise
    an appropriate exception to fail the build on this host.

    Change-Id: I401bb74cf6e17c2b72cc62bf8ec03ec58238c44a
    Closes-Bug: #1795064

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/610163
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dfbcf5e40bb51813f56f983e4f75e29a6034a830
Submitter: Zuul
Branch: stable/rocky

commit dfbcf5e40bb51813f56f983e4f75e29a6034a830
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 3 12:54:53 2018 -0400

    Handle IndexError in _populate_neutron_binding_profile

    This fixes the code that was blindly pop'ing an entry
    from an empty list of PCI devices claimed by the instance.
    It's not exactly clear how we can get into this situation,
    presumably there was a failure in the actual PCI device
    claim logic in the ResourceTracker - maybe related to the
    configured PCI passthrough whitelist. Regardless, we should
    handle the empty PCI device list in this method and raise
    an appropriate exception to fail the build on this host.

    Change-Id: I401bb74cf6e17c2b72cc62bf8ec03ec58238c44a
    Closes-Bug: #1795064
    (cherry picked from commit 035708c37d587e4c5ede7fe80270bdbff98016ac)

This issue was fixed in the openstack/nova 18.1.0 release.

Reviewed: https://review.openstack.org/635897
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8369a78af07b224f109586de398c702db342b49d
Submitter: Zuul
Branch: stable/queens

commit 8369a78af07b224f109586de398c702db342b49d
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 3 12:54:53 2018 -0400

    Handle IndexError in _populate_neutron_binding_profile

    This fixes the code that was blindly pop'ing an entry
    from an empty list of PCI devices claimed by the instance.
    It's not exactly clear how we can get into this situation,
    presumably there was a failure in the actual PCI device
    claim logic in the ResourceTracker - maybe related to the
    configured PCI passthrough whitelist. Regardless, we should
    handle the empty PCI device list in this method and raise
    an appropriate exception to fail the build on this host.

    Change-Id: I401bb74cf6e17c2b72cc62bf8ec03ec58238c44a
    Closes-Bug: #1795064
    (cherry picked from commit 035708c37d587e4c5ede7fe80270bdbff98016ac)
    (cherry picked from commit dfbcf5e40bb51813f56f983e4f75e29a6034a830)

Reviewed: https://review.openstack.org/635921
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=8049e9595bfcf041cd81a4dd353f6285a9806100
Submitter: Zuul
Branch: stable/pike

commit 8049e9595bfcf041cd81a4dd353f6285a9806100
Author: Matt Riedemann <email address hidden>
Date: Wed Oct 3 12:54:53 2018 -0400

    Handle IndexError in _populate_neutron_binding_profile

    This fixes the code that was blindly pop'ing an entry
    from an empty list of PCI devices claimed by the instance.
    It's not exactly clear how we can get into this situation,
    presumably there was a failure in the actual PCI device
    claim logic in the ResourceTracker - maybe related to the
    configured PCI passthrough whitelist. Regardless, we should
    handle the empty PCI device list in this method and raise
    an appropriate exception to fail the build on this host.

    Conflicts:
     nova/network/neutronv2/api.py

    Note(elod.illes): conflict caused by two change not part of
    branch stable/pike: Id847949b4761d51a14e5c2f39552f60a47889aa9
    and Ie3a83fef0dc689b9d37ac43e047ce5d48f567adc

    Change-Id: I401bb74cf6e17c2b72cc62bf8ec03ec58238c44a
    Closes-Bug: #1795064
    (cherry picked from commit 035708c37d587e4c5ede7fe80270bdbff98016ac)
    (cherry picked from commit dfbcf5e40bb51813f56f983e4f75e29a6034a830)
    (cherry picked from commit 8369a78af07b224f109586de398c702db342b49d)

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

This issue was fixed in the openstack/nova 17.0.10 release.

This issue was fixed in the openstack/nova 16.1.8 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers