The function that allocates the structure _virPCIDeviceAddress is: virPCIGetDeviceAddressFromSysfsLink().
This function is called in the following path, specifically in virPCIGetPhysicalFunction(), which is itself called from nodeDeviceSysfsGetPCISRIOVCaps().
Since the tree is a bit large, I've split in 3 parts: [0], [1] and [2].
In that terminology, the bottom function in each stack calls nodeDeviceSysfsGetPCIRelatedDevCaps().
[0] src/node_device/node_device_driver.c
virNodeDeviceGetXMLDesc() [libvirt-nodedev.c]
virNodeDeviceDriver node_device_driver callback <.nodeDeviceGetXMLDesc(), in udev/hal/remote>
nodeDeviceGetXMLDesc()
update_caps()
We can skip the path [2] given that HAL is deprecated and only udev is available on Ubuntu. Trying to follow the path [1] led us to exercise the allocation of the structure responsible for the leak in question, but at the same time, udevEventHandleCallback() is able to deallocate the variable.
The trigger is the following:
while true; do udevadm trigger; done
It goes to the aforementioned described udev path ([1]), and allocates one instance of the PCI _virPCIDeviceAddress structure per device; it is deallocated though according to the following backtrace collected with gdb:
#0 virNodeDevCapsDefFree (caps=0x558d824d6ba0) at ../../../src/conf/node_device_conf.c:1709
#1 0x00007f7f1244db6c in virNodeDeviceDefFree (def=0x558d824c29c0) at ../../../src/conf/node_device_conf.c:146
#2 0x00007f7f1244fe99 in virNodeDeviceAssignDef (devs=0x7f7ee00e68b8, def=0x558d824cde50)
at ../../../src/conf/node_device_conf.c:182
#3 0x00007f7eebe1efae in udevAddOneDevice (device=device@entry=0x558d824d85e0)
at ../../../src/node_device/node_device_udev.c:1402
#4 0x00007f7eebe202f8 in udevEventHandleCallback (watch=watch@entry=7, fd=<optimized out>, events=events@entry=1,
data=data@entry=0x0) at ../../../src/node_device/node_device_udev.c:1546
#5 0x00007f7f12395cf0 in virEventPollDispatchHandles (fds=<optimized out>, nfds=<optimized out>)
at ../../../src/util/vireventpoll.c:509
#6 virEventPollRunOnce () at ../../../src/util/vireventpoll.c:658
#7 0x00007f7f12394332 in virEventRunDefaultImpl () at ../../../src/util/virevent.c:314
#8 0x00007f7f1250789d in virNetDaemonRun (dmn=0x558d8249f340) at ../../../src/rpc/virnetdaemon.c:701
Remains to us reproduce the issue with path [0]. To exercise that path, we can run the following command:
while true; do virsh nodedev-dumpxml pci_0000_08_12_0 >/dev/null; done
It worked, I was able to observe the following stack on Valgrind when exercising the leak path:
16,752 bytes in 1,047 blocks are definitely lost in loss record 580 of 586
at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x50A6283: virAlloc (viralloc.c:144)
by 0x50F1282: virPCIGetDeviceAddressFromSysfsLink (virpci.c:2453)
by 0x50F1417: virPCIGetPhysicalFunction (virpci.c:2486)
by 0x2BD50141: nodeDeviceSysfsGetPCISRIOVCaps (node_device_linux_sysfs.c:157)
by 0x2BD50141: nodeDeviceSysfsGetPCIRelatedDevCaps (node_device_linux_sysfs.c:225)
by 0x2BD4F1D0: update_caps (node_device_driver.c:66)
by 0x2BD4F1D0: nodeDeviceGetXMLDesc (node_device_driver.c:346)
by 0x51C838E: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:292)
by 0x138D6E: remoteDispatchNodeDeviceGetXMLDesc (remote_dispatch.h:12746)
by 0x138D6E: remoteDispatchNodeDeviceGetXMLDescHelper (remote_dispatch.h:12720)
by 0x5213FE8: virNetServerProgramDispatchCall (virnetserverprogram.c:437)
by 0x5213FE8: virNetServerProgramDispatch (virnetserverprogram.c:307)
by 0x520F4F7: virNetServerProcessMsg (virnetserver.c:135)
by 0x520F4F7: virNetServerHandleJob (virnetserver.c:156)
by 0x5106565: virThreadPoolWorker (virthreadpool.c:145)
by 0x5105AE7: virThreadHelper (virthread.c:206)
This is for my libvirt 1.3.1 (Xenial) testing, but I've noticied also in Bionic/Eoan testing. Also, 0000:08:12.0 is a Virtual Function, so the reproducer requires a SR-IOV capable PCI device.
Not only the virsh nodedev command is capable of exercising the leak path; Openstack/Nova composes XMLs for PCI functions and can run libvirt API and so cause the same effect.
The function that allocates the structure _virPCIDeviceAd dress is: virPCIGetDevice AddressFromSysf sLink() .
This function is called in the following path, specifically in virPCIGetPhysic alFunction( ), which is itself called from nodeDeviceSysfs GetPCISRIOVCaps ().
[0] [1] [2] ------- ------- ------- ------- GetPCIRelatedDe vCaps() fsGetPCISRIOVCa ps()
-------
|
|
nodeDeviceSysfs
nodeDeviceSys
Since the tree is a bit large, I've split in 3 parts: [0], [1] and [2]. GetPCIRelatedDe vCaps() .
In that terminology, the bottom function in each stack calls nodeDeviceSysfs
[0] src/node_ device/ node_device_ driver. c
virNodeDeviceGe tXMLDesc( ) [libvirt-nodedev.c] Driver node_device_driver callback <.nodeDeviceGet XMLDesc( ), in udev/hal/remote> etXMLDesc( )
virNodeDevice
nodeDeviceG
update_caps()
[1] src/node_ device/ node_device_ udev.c
|
-- ------- ------- ------- ------- ------- - lize() tialize( ) udevEnumerateDe vices() Callback( ) || udevProcessDevi ceListEntry ice() ceDetails( ) ssPCI()
virStateDriver udevStateDriver [callback register, stateInitialize()]
| |
| nodeStateInitia
| |
nodeStateIni
| |
udevEventHandle
udevAddOneDev
udevGetDevi
udevProce
[2] src/node_ device/ node_device_ hal.c << deprecated >>
| libhal_ ctx_set_ device_ added() |-> multiple calls in this file
----- ------- ------- ---- | | lize() || nodeStateReload || device_added() || dev_refresh() ctx_set_ device_ new_capability( ) <HAL callback> capabilities( ) || device_cap_added() capability( ) tbl[VIR_ NODE_DEV_ CAP_PCI_ DEV]() <gather_fn ptr> pci_cap( )
virStateDriver halStateDriver callbacks register [stateInitialize(), stateInitialize()]
| | | |
nodeStateInitia
|
dev_create() libhal_
| |
gather_
gather_
caps_
gather_
We can skip the path [2] given that HAL is deprecated and only udev is available on Ubuntu. Trying to follow the path [1] led us to exercise the allocation of the structure responsible for the leak in question, but at the same time, udevEventHandle Callback( ) is able to deallocate the variable.
The trigger is the following:
while true; do udevadm trigger; done
It goes to the aforementioned described udev path ([1]), and allocates one instance of the PCI _virPCIDeviceAd dress structure per device; it is deallocated though according to the following backtrace collected with gdb:
#0 virNodeDevCapsD efFree (caps=0x558d824 d6ba0) at ../../. ./src/conf/ node_device_ conf.c: 1709 fFree (def=0x558d824c 29c0) at ../../. ./src/conf/ node_device_ conf.c: 146 signDef (devs=0x7f7ee00 e68b8, def=0x558d824cde50) ./src/conf/ node_device_ conf.c: 182 device@ entry=0x558d824 d85e0) ./src/node_ device/ node_device_ udev.c: 1402 Callback (watch= watch@entry= 7, fd=<optimized out>, events= events@ entry=1, data@entry= 0x0) at ../../. ./src/node_ device/ node_device_ udev.c: 1546 patchHandles (fds=<optimized out>, nfds=<optimized out>) ./src/util/ vireventpoll. c:509 ./src/util/ vireventpoll. c:658 ultImpl () at ../../. ./src/util/ virevent. c:314 f340) at ../../. ./src/rpc/ virnetdaemon. c:701
#1 0x00007f7f1244db6c in virNodeDeviceDe
#2 0x00007f7f1244fe99 in virNodeDeviceAs
at ../../.
#3 0x00007f7eebe1efae in udevAddOneDevice (device=
at ../../.
#4 0x00007f7eebe202f8 in udevEventHandle
data=
#5 0x00007f7f12395cf0 in virEventPollDis
at ../../.
#6 virEventPollRunOnce () at ../../.
#7 0x00007f7f12394332 in virEventRunDefa
#8 0x00007f7f1250789d in virNetDaemonRun (dmn=0x558d8249
Remains to us reproduce the issue with path [0]. To exercise that path, we can run the following command:
while true; do virsh nodedev-dumpxml pci_0000_08_12_0 >/dev/null; done
It worked, I was able to observe the following stack on Valgrind when exercising the leak path:
16,752 bytes in 1,047 blocks are definitely lost in loss record 580 of 586 valgrind/ vgpreload_ memcheck- amd64-linux. so) AddressFromSysf sLink (virpci.c:2453) alFunction (virpci.c:2486) GetPCISRIOVCaps (node_device_ linux_sysfs. c:157) GetPCIRelatedDe vCaps (node_device_ linux_sysfs. c:225) driver. c:66) LDesc (node_device_ driver. c:346) tXMLDesc (libvirt- nodedev. c:292) odeDeviceGetXML Desc (remote_ dispatch. h:12746) odeDeviceGetXML DescHelper (remote_ dispatch. h:12720) gramDispatchCal l (virnetserverpr ogram.c: 437) gramDispatch (virnetserverpr ogram.c: 307) cessMsg (virnetserver. c:135) dleJob (virnetserver. c:156) c:145)
at 0x4C2FB55: calloc (in /usr/lib/
by 0x50A6283: virAlloc (viralloc.c:144)
by 0x50F1282: virPCIGetDevice
by 0x50F1417: virPCIGetPhysic
by 0x2BD50141: nodeDeviceSysfs
by 0x2BD50141: nodeDeviceSysfs
by 0x2BD4F1D0: update_caps (node_device_
by 0x2BD4F1D0: nodeDeviceGetXM
by 0x51C838E: virNodeDeviceGe
by 0x138D6E: remoteDispatchN
by 0x138D6E: remoteDispatchN
by 0x5213FE8: virNetServerPro
by 0x5213FE8: virNetServerPro
by 0x520F4F7: virNetServerPro
by 0x520F4F7: virNetServerHan
by 0x5106565: virThreadPoolWorker (virthreadpool.
by 0x5105AE7: virThreadHelper (virthread.c:206)
This is for my libvirt 1.3.1 (Xenial) testing, but I've noticied also in Bionic/Eoan testing. Also, 0000:08:12.0 is a Virtual Function, so the reproducer requires a SR-IOV capable PCI device.
Not only the virsh nodedev command is capable of exercising the leak path; Openstack/Nova composes XMLs for PCI functions and can run libvirt API and so cause the same effect.