Comment 27 for bug 1986520

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hey,
the error message inside libvirt is from parsing PCI VPD Data.
If you say removing that card removes the message it seems to indicate that the VPD of that device is either
a) broken as coming from the device and needs a FW Update (or report to Intel to create one)
or
b) valid VPD data but uncommon and breaking the parser in libvirt

#1. libvirt
You could gather some info and then decide if it is more (a) or (b) to then report it directly to intel or to upstream libvirt [1] to have a look.

I think to check what libvirt can (or can not read) you could run
$ virsh nodedev-list
# now select the i350 OCP
$ virsh nodedev-dumpxml <the i350 device ID from above cmd>

That should cause libvirt to try to read and report you the VPD data of the device.
I'd expect this triggers the issue, maybe you find anything odd in there already ...

1b. debug
If 1 recerates the issue you might want to run the same with livbirt debug info enabled [3]
and report that here and/or upstream depending what you see.

#2. kernel
This is rather kernel dependent, it might be worth trying a few older/newer mainline kernels [2] to see if any might already behave different

#3. data
It might be great to report the actual VPD data exposed.
That would be
$ sudo cat /sys/devices/pci.../<ID>/<ID>/vpd

Until we (or intel or upstreams) have that data there is not much we can do, hence setting the bug to incomplete for now.

[1]: https://gitlab.com/libvirt/libvirt/-/issues
[2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
[3]: https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Debug.html