No net device was found for VF

Bug #1808738 reported by Satish Patel
42
This bug affects 9 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Triaged
Low
Unassigned

Bug Description

I am running queens openstack with 150 SR-IOV compute nodes and everything working great so far, but i am seeing following WARNING mesg very frequently and not sure its a bug or configuration issue, can someone provide clarity on this logs

compute node "ostack-compute-sriov-01" running 2 sr-iov instances and each instance has two SR-IOV nic associated, so total 4 vf in use on compute node.

[root@ostack-compute-sriov-01 ~]# virsh list
 Id Name State
----------------------------------------------------
 1 instance-00000540 running
 2 instance-000005c4 running

[root@ostack-compute-sriov-01 ~]# lspci -v | grep -i eth | grep "Virtual Function"
03:09.0 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function
03:09.1 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function
03:09.2 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function
03:09.3 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function
03:09.4 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function
03:09.5 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function
03:09.6 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function
03:09.7 Ethernet controller: Broadcom Limited NetXtreme II BCM57810 10 Gigabit Ethernet Virtual Function

[root@ostack-compute-sriov-01 ~]# ip link show dev eno2
4: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether c4:34:6b:cb:a0:f4 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC fa:16:3e:2d:58:69, vlan 11, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto
    vf 1 MAC fa:16:3e:a6:67:60, vlan 200, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto
    vf 2 MAC fa:16:3e:9a:98:e0, vlan 200, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto
    vf 3 MAC 00:00:00:00:00:00, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto
    vf 4 MAC fa:16:3e:7d:ef:0c, vlan 11, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto
    vf 5 MAC 00:00:00:00:00:00, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto
    vf 6 MAC 00:00:00:00:00:00, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto
    vf 7 MAC 00:00:00:00:00:00, tx rate 10000 (Mbps), max_tx_rate 10000Mbps, spoof checking on, link-state auto

I am seeing following WARNING messages from all my compute nodes, interesting thing is all 4 lines timestamp is same so pretty much pop up in log file same time, I do have

2018-12-16 22:11:05.070 40288 WARNING nova.pci.utils [req-0d87b5e4-6ece-4beb-880c-51c7c5835a66 - - - - -] No net device was found for VF 0000:03:09.4: PciDeviceNotFoundById: PCI device 0000:03:09.4 not found
2018-12-16 22:11:05.237 40288 WARNING nova.pci.utils [req-0d87b5e4-6ece-4beb-880c-51c7c5835a66 - - - - -] No net device was found for VF 0000:03:09.1: PciDeviceNotFoundById: PCI device 0000:03:09.1 not found
2018-12-16 22:11:05.242 40288 WARNING nova.pci.utils [req-0d87b5e4-6ece-4beb-880c-51c7c5835a66 - - - - -] No net device was found for VF 0000:03:09.2: PciDeviceNotFoundById: PCI device 0000:03:09.2 not found
2018-12-16 22:11:05.269 40288 WARNING nova.pci.utils [req-0d87b5e4-6ece-4beb-880c-51c7c5835a66 - - - - -] No net device was found for VF 0000:03:09.0: PciDeviceNotFoundById: PCI device 0000:03:09.0 not found

currently this warning not causing issue but worried if its related to something big issue and i am not aware... if this is informative mesg then how do i reduce it because otherside it creating noise in my logs spelunking..

Tags: libvirt pci
Revision history for this message
Satish Patel (satish-txt) wrote :

This is what my nova.conf look like

[root@ostack-compute-sriov-01 ~]# cat /etc/nova/nova.conf | grep -i pci
# PCI Passthrough
pci_passthrough_whitelist = "{ "physical_network":"vlan", "devname":"eno2" }"
enabled_filters = RetryFilter, AvailabilityZoneFilter, RamFilter, AggregateRamFilter, ComputeFilter, AggregateCoreFilter, DiskFilter, AggregateDiskFilter, AggregateNumInstancesFilter, AggregateIoOpsFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, NUMATopologyFilter, PciPassthroughFilter

Revision history for this message
sean mooney (sean-k-mooney) wrote :

you might be able to filter out these message by appending
nova.pci.utils=ERROR to the default set of log level and defined for this config option
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.default_log_levels

not this warning is expected if the vf is bound to the vfio-pci dirver as the vf does not have a corresponding netdev.

the only side effect of not haveing a netdev is we cannot discover the nic feature flags.
 currently we dont use them for anything as the code to discover them and store the min the database
merged but the code to schedule on them never did so this should not cause any issues im aware of.

Changed in nova:
importance: Undecided → Low
status: New → Triaged
tags: added: libvirt pci
Revision history for this message
sean mooney (sean-k-mooney) wrote :

note i am triagging this as low as it does not cause an operational issue in normal opertation but
does needlessly consome log space. i would suggest we descalate the log to debug level as in manycase there is nothing the operator can do.

the fix form an operator perspcitve is to ensure the at the vf are bound back to the vendor vf dirver when they are detached from the vm or allocated on the host. This was done in the past by udev rules
but im not sure what the correct way to do this is today or on systems that do not have udev.

this warning exists as there are some non critcal operation that may require the netdevs to exist.

first without a netdev i do not belive the libvirt driver cannot use ethtool ioctls via libvirt nodedev api to discover the nic feature flags. second this may prevent the use of vnic_type=macvtap.
we currently create the macvtap with the vf netdev as teh parent netdev. i do not belive you can create a macvtap form the vf directly.

the nic feature flags are currently unsed as the discover code merged but the scheduling code never
did. the macvtap vnic type can also be disable if your hardware dose not support this by setting
https://docs.openstack.org/neutron/latest/configuration/ml2-conf.html#sriov_driver.vnic_type_blacklist

i think this could be a nice low hanging fruit bug if we agree to just decrease the log level of the message but i wont tag it as such as it would be quite involved to close this bug with a different resolution.

Revision history for this message
Andrey Bubyr (abubyr) wrote :
Download full text (4.7 KiB)

We have faced this bug on Openstack Pike too. Also we cannot boot VM with Neutron port of vnic_type=macvtap:
2019-03-05 12:31:30,695.695 47453 DEBUG nova.network.os_vif_util [req-173f5f59-3173-45ad-a304-8e68b504047d 1c0451fbdf0448ad81ea0a538460bf83 f1412680333743d288cc13ba57055580 - default default] No conversion for V
IF type hw_veb yet nova_to_osvif_vif /usr/lib/python2.7/dist-packages/nova/network/os_vif_util.py:494
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [req-173f5f59-3173-45ad-a304-8e68b504047d 1c0451fbdf0448ad81ea0a538460bf83 f1412680333743d288cc13ba57055580 - default default] [instance: d1c6f9d1-c5a
7-4520-93db-be53e0e19e9e] Instance failed to spawn: PciDeviceNotFoundById: PCI device 0000:19:04.7 not found
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] Traceback (most recent call last):
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2185, in _build_resources
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] yield resources
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2000, in _build_and_run_insta
nce
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] block_device_info=block_device_info)
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2913, in spawn
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] block_device_info=block_device_info)
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5251, in _get_guest_xml
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] context)
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 5066, in _get_guest_confi
g
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] flavor, virt_type, self._host)
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 567, in get_config
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] inst_type, virt_type, host)
2019-03-05 12:31:30,696.696 47453 ERROR nova.compute.manager [instance: d1c6f9d1-c5a7-4520-93db-be53e0e19e9e] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 331, in get_config_hw_veb
2019-03-0...

Read more...

Revision history for this message
Nikolay Pliashechnikov (npliashechnikov) wrote :

A follow-up to the previous post:
We were trying to wrap a Virtual Function into macvtap so that NFVs not supporting modern NICs would still get decent performance.
The root cause of our issue was that Nova (and libvirt) require the VFs to be visible to the host OS as the network devices, which wasn't the case for us. We 'sorted' the issue by loading VF driver, i40evf in our case, manually, and after this we were able to launch VM with macvtap port.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello:

I see where the problem could be. "get_available_resource" will call "_get_pci_passthrough_devices" -> "_get_pcidev_info" -> "_get_device_capabilities" -> "_get_pcinet_info" -> "get_net_name_by_vf_pci_address" (here we have the warning message).

When "get_available_resource" is called before a VM is created, the VF port is present in the kernel. But this method is called "as part of a periodic task that records the results in the DB" (from the function description). When the port is bound to the VM, the network interface is not longer present in the host kernel and the method will log this warning message.

Because this is something usual in normal operation, I would recommend to lower to log severity to INFO or DEBUG.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Just ran into this issue with a massive SR-IOV deployment on Train. The amount of logs is simply ridiculous. I believe this should really be a DEBUG-level message...

Revision history for this message
sean mooney (sean-k-mooney) wrote :

the reason this was a waring is that it can indicate a proablem with then underlying nics.
what we could do is proably only print it once on start up.

if you are using devname to whitelist the pci devices for example then if a device is not bound to a network device then it could be an error.

in terms or macvtap yes its expected that the VF must not be bound to vfio-pci that is not a bug or related to the warning message.

yes the repeted message is form the get_available_resource preodic task.
this isnt a case of we dont know how to fix it but more that i did not personally fell like it was worth fixing we have much higher priorty isues so i have never worked on this and no one else has vollentered to work on it.

in the past udev used to rename the pf and VF before we started using systemd/bios persistned device names so this waring was also useful to tell operator that that had happend and they should fix it but that has not been the case for many years no. it can rarely happen but since peopel should not be using devname anyway it shoudl be much less of a problem now.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

I have:

  alias = { "name": "MLNXCX3", "vendor_id": "15b3", "product_id": "1004", "device_type": "type-VF" }
  passthrough_whitelist = { "vendor_id": "15b3", "product_id": "1004" }

To let share the InfiniBand devices with VMs.
This spams the logs; no devname is being used.

Revision history for this message
melanie witt (melwitt) wrote :

I noticed that the fix [1] for [2] that landed in Victoria happened to remove the "No net device was found for VF" message. AFAICT that change has inadvertently "fixed" the issue in this bug.

[1] https://review.opendev.org/c/openstack/nova/+/739131
[2] https://launchpad.net/bugs/1883671

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.