2023-06-16 00:53:11 |
Rafael Lopez |
bug |
|
|
added bug |
2023-06-16 00:53:39 |
Rafael Lopez |
libvirt (Ubuntu): importance |
Undecided |
Medium |
|
2023-06-16 00:53:54 |
Rafael Lopez |
libvirt (Ubuntu): status |
New |
In Progress |
|
2023-06-16 00:55:04 |
Rafael Lopez |
nominated for series |
|
Ubuntu Jammy |
|
2023-06-16 00:55:04 |
Rafael Lopez |
bug task added |
|
libvirt (Ubuntu Jammy) |
|
2023-06-16 00:55:12 |
Rafael Lopez |
libvirt (Ubuntu Jammy): status |
New |
In Progress |
|
2023-06-16 00:55:14 |
Rafael Lopez |
libvirt (Ubuntu Jammy): importance |
Undecided |
Medium |
|
2023-06-16 00:55:47 |
Rafael Lopez |
description |
Memory grows over time, likely due to a memory leak in PCI data collection. Can only reproduce on hardware environments, may be particular to specific PCI devices that supply VPD data.
Valgrind stacks after a couple of hours:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
==3411871== 1,608,514 (134,160 direct, 1,474,354 indirect) bytes in 5,590 blocks are definitely lost in loss record 2,844 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0x4A2D075: virNodeDeviceCapsListExport (node_device_conf.c:2707)
==3411871== by 0xFC8D10F: nodeDeviceListCaps (node_device_driver.c:459)
==3411871== by 0x4B7EE68: virNodeDeviceListCaps (libvirt-nodedev.c:402)
==3411871== by 0x1554FE: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15688)
==3411871== by 0x1554FE: remoteDispatchNodeDeviceListCapsHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15655)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Possibly fixed by:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa.patch |
Memory grows over time, likely due to a memory leak in PCI data collection. Can only reproduce on hardware environments, may be particular to specific PCI devices that supply VPD data.
Only seen on Jammy so far.
Valgrind stacks after a couple of hours:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
==3411871== 1,608,514 (134,160 direct, 1,474,354 indirect) bytes in 5,590 blocks are definitely lost in loss record 2,844 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0x4A2D075: virNodeDeviceCapsListExport (node_device_conf.c:2707)
==3411871== by 0xFC8D10F: nodeDeviceListCaps (node_device_driver.c:459)
==3411871== by 0x4B7EE68: virNodeDeviceListCaps (libvirt-nodedev.c:402)
==3411871== by 0x1554FE: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15688)
==3411871== by 0x1554FE: remoteDispatchNodeDeviceListCapsHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15655)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Possibly fixed by:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa.patch |
|
2023-06-16 01:40:02 |
Rafael Lopez |
libvirt (Ubuntu Jammy): assignee |
|
Rafael Lopez (rafael.lopez) |
|
2023-06-20 00:02:40 |
Rafael Lopez |
description |
Memory grows over time, likely due to a memory leak in PCI data collection. Can only reproduce on hardware environments, may be particular to specific PCI devices that supply VPD data.
Only seen on Jammy so far.
Valgrind stacks after a couple of hours:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
==3411871== 1,608,514 (134,160 direct, 1,474,354 indirect) bytes in 5,590 blocks are definitely lost in loss record 2,844 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0x4A2D075: virNodeDeviceCapsListExport (node_device_conf.c:2707)
==3411871== by 0xFC8D10F: nodeDeviceListCaps (node_device_driver.c:459)
==3411871== by 0x4B7EE68: virNodeDeviceListCaps (libvirt-nodedev.c:402)
==3411871== by 0x1554FE: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15688)
==3411871== by 0x1554FE: remoteDispatchNodeDeviceListCapsHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15655)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Possibly fixed by:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa.patch |
[ Impact ]
Memory leak causing growing memory footprints in long running libvirt processes. In a fairly busy openstack env, this showed a steady linear growth up to ~15GB after a couple of months.
This would impact many openstack deployments and anyone else using libvirt with particular PCI devices (VPD capable), forcing them to restart libvirt regularly to reset the memory consumption. This memory leak has only been observed so far in a hardware (metal) environment with mellanox devices, but ostensibly occurs wherever a VPD capable device exists.
[ Test Plan ]
It is only possible to reproduce this on certain hardware, seemingly hosts that have PCI cards that present VPD (Vital Product Data). For example, this was noticed on a host where libvirt was obtaining data from a mellanox card that presented vpd data. You can tell if a PCI device presents vpd data by looking at the sysfs entry /sys/bus/pci/devices/{address}/vpd, or from `lswh` if you see 'vpd' in the list of capabilities, for example:
*-network:0
description: Ethernet interface
product: MT2892 Family [ConnectX-6 Dx]
vendor: Mellanox Technologies
...snip...
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd 40000bt-fd autonegotiation
...snip...
It is easy to confirm by running libvirt in valgrind which will show a stack like the following:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Knowing the server has a VPD capable device, and monitoring the memory consumption over time can show if the issue is present as well as when it is fixed. Before the fix there is a clear linear growth, which should flatten out after applying the patch.
[ Where problems could occur ]
The functions changed are only called in environments where VPD devices exist, and the patch adjusts pointers and contents of data structures related to VPD capable PCI devices found by libvirt.
Things could go wrong in environments wherever VPD capable devices are present, and may show up as garbage data about the device, null pointer where there should be data, segfaults.
[ Other Info ]
The backport is derived from an upstream fix:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa |
|
2023-06-20 02:06:34 |
Rafael Lopez |
attachment added |
|
lp-2024114-pcivpd-memleak-jammy.debdiff https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2024114/+attachment/5680882/+files/lp-2024114-pcivpd-memleak-jammy.debdiff |
|
2023-06-20 02:08:20 |
Rafael Lopez |
bug |
|
|
added subscriber sts-sponsors (DEACTIVATED; use se-sponsors) |
2023-06-20 02:08:26 |
Rafael Lopez |
removed subscriber sts-sponsors (DEACTIVATED; use se-sponsors) |
|
|
|
2023-06-20 02:09:01 |
Rafael Lopez |
bug |
|
|
added subscriber Support Engineering Sponsors |
2023-06-20 04:14:58 |
Ubuntu Foundations Team Bug Bot |
tags |
|
patch |
|
2023-06-20 04:15:02 |
Ubuntu Foundations Team Bug Bot |
bug |
|
|
added subscriber Ubuntu Sponsors |
2023-06-20 23:53:58 |
Rafael Lopez |
nominated for series |
|
Ubuntu Kinetic |
|
2023-06-20 23:53:58 |
Rafael Lopez |
bug task added |
|
libvirt (Ubuntu Kinetic) |
|
2023-06-20 23:54:05 |
Rafael Lopez |
libvirt (Ubuntu Kinetic): status |
New |
In Progress |
|
2023-06-20 23:54:08 |
Rafael Lopez |
libvirt (Ubuntu Kinetic): importance |
Undecided |
Medium |
|
2023-06-20 23:54:12 |
Rafael Lopez |
libvirt (Ubuntu Kinetic): assignee |
|
Rafael Lopez (rafael.lopez) |
|
2023-06-20 23:56:24 |
Rafael Lopez |
description |
[ Impact ]
Memory leak causing growing memory footprints in long running libvirt processes. In a fairly busy openstack env, this showed a steady linear growth up to ~15GB after a couple of months.
This would impact many openstack deployments and anyone else using libvirt with particular PCI devices (VPD capable), forcing them to restart libvirt regularly to reset the memory consumption. This memory leak has only been observed so far in a hardware (metal) environment with mellanox devices, but ostensibly occurs wherever a VPD capable device exists.
[ Test Plan ]
It is only possible to reproduce this on certain hardware, seemingly hosts that have PCI cards that present VPD (Vital Product Data). For example, this was noticed on a host where libvirt was obtaining data from a mellanox card that presented vpd data. You can tell if a PCI device presents vpd data by looking at the sysfs entry /sys/bus/pci/devices/{address}/vpd, or from `lswh` if you see 'vpd' in the list of capabilities, for example:
*-network:0
description: Ethernet interface
product: MT2892 Family [ConnectX-6 Dx]
vendor: Mellanox Technologies
...snip...
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd 40000bt-fd autonegotiation
...snip...
It is easy to confirm by running libvirt in valgrind which will show a stack like the following:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Knowing the server has a VPD capable device, and monitoring the memory consumption over time can show if the issue is present as well as when it is fixed. Before the fix there is a clear linear growth, which should flatten out after applying the patch.
[ Where problems could occur ]
The functions changed are only called in environments where VPD devices exist, and the patch adjusts pointers and contents of data structures related to VPD capable PCI devices found by libvirt.
Things could go wrong in environments wherever VPD capable devices are present, and may show up as garbage data about the device, null pointer where there should be data, segfaults.
[ Other Info ]
The backport is derived from an upstream fix:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa |
[ Impact ]
Memory leak causing growing memory footprints in long running libvirt processes. In a fairly busy openstack env, this showed a steady linear growth up to ~15GB after a couple of months.
This would impact many openstack deployments and anyone else using libvirt with particular PCI devices (VPD capable), forcing them to restart libvirt regularly to reset the memory consumption. This memory leak has only been observed so far in a hardware (metal) environment with mellanox devices, but ostensibly occurs wherever a VPD capable device exists.
[ Test Plan ]
It is only possible to reproduce this on certain hardware, seemingly hosts that have PCI cards that present VPD (Vital Product Data). For example, this was noticed on a host where libvirt was obtaining data from a mellanox card that presented vpd data. You can tell if a PCI device presents vpd data by looking at the sysfs entry /sys/bus/pci/devices/{address}/vpd, or from `lswh` if you see 'vpd' in the list of capabilities, for example:
*-network:0
description: Ethernet interface
product: MT2892 Family [ConnectX-6 Dx]
vendor: Mellanox Technologies
...snip...
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd 40000bt-fd autonegotiation
...snip...
It is easy to confirm by running libvirt in valgrind which will show a stack like the following:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Knowing the server has a VPD capable device, and monitoring the memory consumption over time can show if the issue is present as well as when it is fixed. Before the fix there is a clear linear growth, which should flatten out after applying the patch.
[ Where problems could occur ]
The functions changed are only called in environments where VPD devices exist, and the patch adjusts pointers and contents of data structures related to VPD capable PCI devices found by libvirt.
Things could go wrong in environments wherever VPD capable devices are present, and may show up as garbage data about the device, null pointer where there should be data, segfaults.
[ Other Info ]
The backport is derived from an upstream fix:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa
This commit is missing from Jammy and Kinetic, but present in Lunar+. The same issue has not been observed in a similar environment running Focal. |
|
2023-06-20 23:56:41 |
Rafael Lopez |
description |
[ Impact ]
Memory leak causing growing memory footprints in long running libvirt processes. In a fairly busy openstack env, this showed a steady linear growth up to ~15GB after a couple of months.
This would impact many openstack deployments and anyone else using libvirt with particular PCI devices (VPD capable), forcing them to restart libvirt regularly to reset the memory consumption. This memory leak has only been observed so far in a hardware (metal) environment with mellanox devices, but ostensibly occurs wherever a VPD capable device exists.
[ Test Plan ]
It is only possible to reproduce this on certain hardware, seemingly hosts that have PCI cards that present VPD (Vital Product Data). For example, this was noticed on a host where libvirt was obtaining data from a mellanox card that presented vpd data. You can tell if a PCI device presents vpd data by looking at the sysfs entry /sys/bus/pci/devices/{address}/vpd, or from `lswh` if you see 'vpd' in the list of capabilities, for example:
*-network:0
description: Ethernet interface
product: MT2892 Family [ConnectX-6 Dx]
vendor: Mellanox Technologies
...snip...
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd 40000bt-fd autonegotiation
...snip...
It is easy to confirm by running libvirt in valgrind which will show a stack like the following:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Knowing the server has a VPD capable device, and monitoring the memory consumption over time can show if the issue is present as well as when it is fixed. Before the fix there is a clear linear growth, which should flatten out after applying the patch.
[ Where problems could occur ]
The functions changed are only called in environments where VPD devices exist, and the patch adjusts pointers and contents of data structures related to VPD capable PCI devices found by libvirt.
Things could go wrong in environments wherever VPD capable devices are present, and may show up as garbage data about the device, null pointer where there should be data, segfaults.
[ Other Info ]
The backport is derived from an upstream fix:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa
This commit is missing from Jammy and Kinetic, but present in Lunar+. The same issue has not been observed in a similar environment running Focal. |
[ Impact ]
Memory leak causing growing memory footprints in long running libvirt processes. In a fairly busy openstack env, this showed a steady linear growth up to ~15GB after a couple of months.
This would impact many openstack deployments and anyone else using libvirt with particular PCI devices (VPD capable), forcing them to restart libvirt regularly to reset the memory consumption. This memory leak has only been observed so far in a hardware (metal) environment with mellanox devices, but ostensibly occurs wherever a VPD capable device exists.
[ Test Plan ]
It is only possible to reproduce this on certain hardware, seemingly hosts that have PCI cards that present VPD (Vital Product Data). For example, this was noticed on a host where libvirt was obtaining data from a mellanox card that presented vpd data. You can tell if a PCI device presents vpd data by looking at the sysfs entry /sys/bus/pci/devices/{address}/vpd, or from `lswh` if you see 'vpd' in the list of capabilities, for example:
*-network:0
description: Ethernet interface
product: MT2892 Family [ConnectX-6 Dx]
vendor: Mellanox Technologies
...snip...
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd 40000bt-fd autonegotiation
...snip...
It is easy to confirm by running libvirt in valgrind which will show a stack like the following:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Knowing the server has a VPD capable device, and monitoring the memory consumption over time can show if the issue is present as well as when it is fixed. Before the fix there is a clear linear growth, which should flatten out after applying the patch.
[ Where problems could occur ]
The functions changed are only called in environments where VPD devices exist, and the patch adjusts pointers and contents of data structures related to VPD capable PCI devices found by libvirt.
Things could go wrong in environments wherever VPD capable devices are present, and may show up as garbage data about the device, null pointer where there should be data, segfaults.
[ Other Info ]
The backport is derived from an upstream fix:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa
This patch is missing from Jammy and Kinetic, but present in Lunar+. The same issue has not been observed in a similar environment running Focal. |
|
2023-06-21 13:06:51 |
Junien F |
bug |
|
|
added subscriber The Canonical Sysadmins |
2023-06-21 18:34:19 |
Jeremy Bícha |
libvirt (Ubuntu): status |
In Progress |
Fix Released |
|
2023-06-21 18:34:59 |
Jeremy Bícha |
libvirt (Ubuntu Kinetic): status |
In Progress |
Triaged |
|
2023-06-21 18:42:13 |
Jeremy Bícha |
removed subscriber Ubuntu Sponsors |
|
|
|
2023-06-21 18:42:27 |
Jeremy Bícha |
bug |
|
|
added subscriber Jeremy Bícha |
2023-06-22 12:33:10 |
Heitor Alves de Siqueira |
removed subscriber Support Engineering Sponsors |
|
|
|
2023-06-22 12:33:20 |
Heitor Alves de Siqueira |
tags |
patch |
patch se-sponsor-halves |
|
2023-06-30 18:11:51 |
Andreas Hasenack |
bug watch added |
|
https://bugzilla.redhat.com/show_bug.cgi?id=2143235 |
|
2023-06-30 18:12:30 |
Andreas Hasenack |
libvirt (Ubuntu Jammy): status |
In Progress |
Fix Committed |
|
2023-06-30 18:12:32 |
Andreas Hasenack |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2023-06-30 18:12:36 |
Andreas Hasenack |
bug |
|
|
added subscriber SRU Verification |
2023-06-30 18:12:40 |
Andreas Hasenack |
tags |
patch se-sponsor-halves |
patch se-sponsor-halves verification-needed verification-needed-jammy |
|
2023-07-04 02:51:17 |
Rafael Lopez |
libvirt (Ubuntu Kinetic): status |
Triaged |
Won't Fix |
|
2023-07-11 23:46:06 |
Rafael Lopez |
description |
[ Impact ]
Memory leak causing growing memory footprints in long running libvirt processes. In a fairly busy openstack env, this showed a steady linear growth up to ~15GB after a couple of months.
This would impact many openstack deployments and anyone else using libvirt with particular PCI devices (VPD capable), forcing them to restart libvirt regularly to reset the memory consumption. This memory leak has only been observed so far in a hardware (metal) environment with mellanox devices, but ostensibly occurs wherever a VPD capable device exists.
[ Test Plan ]
It is only possible to reproduce this on certain hardware, seemingly hosts that have PCI cards that present VPD (Vital Product Data). For example, this was noticed on a host where libvirt was obtaining data from a mellanox card that presented vpd data. You can tell if a PCI device presents vpd data by looking at the sysfs entry /sys/bus/pci/devices/{address}/vpd, or from `lswh` if you see 'vpd' in the list of capabilities, for example:
*-network:0
description: Ethernet interface
product: MT2892 Family [ConnectX-6 Dx]
vendor: Mellanox Technologies
...snip...
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd 40000bt-fd autonegotiation
...snip...
It is easy to confirm by running libvirt in valgrind which will show a stack like the following:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100)
Knowing the server has a VPD capable device, and monitoring the memory consumption over time can show if the issue is present as well as when it is fixed. Before the fix there is a clear linear growth, which should flatten out after applying the patch.
[ Where problems could occur ]
The functions changed are only called in environments where VPD devices exist, and the patch adjusts pointers and contents of data structures related to VPD capable PCI devices found by libvirt.
Things could go wrong in environments wherever VPD capable devices are present, and may show up as garbage data about the device, null pointer where there should be data, segfaults.
[ Other Info ]
The backport is derived from an upstream fix:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa
This patch is missing from Jammy and Kinetic, but present in Lunar+. The same issue has not been observed in a similar environment running Focal. |
[ Impact ]
Memory leak causing growing memory footprints in long running libvirt processes. In a fairly busy openstack env, this showed a steady linear growth up to ~15GB after a couple of months.
This would impact many openstack deployments and anyone else using libvirt with particular PCI devices (VPD capable), forcing them to restart libvirt regularly to reset the memory consumption. This memory leak has only been observed so far in a hardware (metal) environment with mellanox devices, but ostensibly occurs wherever a VPD capable device exists.
[ Test Plan ]
It is only possible to reproduce this on certain hardware, seemingly hosts that have PCI cards that present VPD (Vital Product Data). For example, this was noticed on a host where libvirt was obtaining data from a mellanox card that presented vpd data. You can tell if a PCI device presents vpd data by looking at the sysfs entry /sys/bus/pci/devices/{address}/vpd, or from `lswh` if you see 'vpd' in the list of capabilities, for example:
*-network:0
description: Ethernet interface
product: MT2892 Family [ConnectX-6 Dx]
vendor: Mellanox Technologies
...snip...
capabilities: pciexpress vpd msix pm bus_master cap_list rom ethernet physical 1000bt-fd 10000bt-fd 25000bt-fd 40000bt-fd autonegotiation
...snip...
1. Knowing the server has a VPD capable device, and monitoring the memory consumption over time can show if the issue is present as well as when it is fixed. Before the fix there is a clear linear growth, which should flatten out after applying the patch.
2. This is also another simple test that can be done:
Run "virsh nodedev-list" for 1000 times, and check the memory occupied by the libvirtd service.
#!/bin/sh
systemctl start libvirtd
systemctl status libvirtd
i=0
while [ $i -ne 1000 ]
do
virsh nodedev-list
i=$(($i+1))
echo "$i"
done
systemctl status libvirtd
and watch the "Memory:" field grow (or not, if the fix is there).
[ Where problems could occur ]
The functions changed are only called in environments where VPD devices exist, and the patch adjusts pointers and contents of data structures related to VPD capable PCI devices found by libvirt.
Things could go wrong in environments wherever VPD capable devices are present, and may show up as garbage data about the device, null pointer where there should be data, segfaults.
[ Other Info ]
The backport is derived from an upstream fix:
https://github.com/libvirt/libvirt/commit/64d32118540aca3d42bc5ee21c8b780cafe04bfa
This patch is missing from Jammy and Kinetic, but present in Lunar+. The same issue has not been observed in a similar environment running Focal.
Running libvirt in valgrind which will stacks like the following:
==3411871== 7,559,541 (407,160 direct, 7,152,381 indirect) bytes in 16,965 blocks are definitely lost in loss record 2,846 of 2,846
==3411871== at 0x484DA83: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==3411871== by 0x4D53C50: g_malloc0 (gmem.c:161)
==3411871== by 0x49A2832: virPCIVPDParse (virpcivpd.c:672)
==3411871== by 0x4983BD8: virPCIDeviceGetVPD (virpci.c:2694)
==3411871== by 0x4A2CEB7: UnknownInlinedFun (node_device_conf.c:3032)
==3411871== by 0x4A2CEB7: virNodeDeviceGetPCIDynamicCaps (node_device_conf.c:3065)
==3411871== by 0x4A2D03D: virNodeDeviceUpdateCaps (node_device_conf.c:2636)
==3411871== by 0xFC8CD35: nodeDeviceGetXMLDesc (node_device_driver.c:370)
==3411871== by 0x4B7E9D1: virNodeDeviceGetXMLDesc (libvirt-nodedev.c:275)
==3411871== by 0x15519A: UnknownInlinedFun (remote_daemon_dispatch_stubs.h:15507)
==3411871== by 0x15519A: remoteDispatchNodeDeviceGetXMLDescHelper.lto_priv.0 (remote_daemon_dispatch_stubs.h:15484)
==3411871== by 0x4A59785: UnknownInlinedFun (virnetserverprogram.c:428)
==3411871== by 0x4A59785: virNetServerProgramDispatch (virnetserverprogram.c:302)
==3411871== by 0x4A60067: UnknownInlinedFun (virnetserver.c:140)
==3411871== by 0x4A60067: virNetServerHandleJob (virnetserver.c:160)
==3411871== by 0x499B982: virThreadPoolWorker (virthreadpool.c:164)
==3411871== by 0x499A4D8: virThreadHelper (virthread.c:241)
==3411871== by 0x514CB42: start_thread (pthread_create.c:442)
==3411871== by 0x51DDBB3: clone (clone.S:100) |
|
2023-07-12 00:16:30 |
Rafael Lopez |
tags |
patch se-sponsor-halves verification-needed verification-needed-jammy |
patch se-sponsor-halves verification-done-jammy |
|
2023-07-12 10:39:04 |
Robie Basak |
removed subscriber Ubuntu Stable Release Updates Team |
|
|
|
2023-07-12 10:39:03 |
Launchpad Janitor |
libvirt (Ubuntu Jammy): status |
Fix Committed |
Fix Released |
|