Memory leak of struct _virPCIDeviceAddress on libvirt
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Fix Released
|
Undecided
|
Guilherme G. Piccoli | ||
Mitaka |
Fix Released
|
Undecided
|
Guilherme G. Piccoli | ||
Queens |
Fix Released
|
Undecided
|
Guilherme G. Piccoli | ||
libvirt (Ubuntu) |
Fix Released
|
High
|
Guilherme G. Piccoli | ||
Xenial |
Fix Released
|
High
|
Guilherme G. Piccoli | ||
Bionic |
Fix Released
|
High
|
Guilherme G. Piccoli | ||
Eoan |
Fix Released
|
High
|
Guilherme G. Piccoli | ||
Focal |
Fix Released
|
High
|
Guilherme G. Piccoli |
Bug Description
[Impact]
* There's a long-term memory leak in libvirt related to the PCI information gathering from sysfs in Linux, specially related with SR-IOV devices. This was fixed by commit 38816336 ("node_device_conf: Don't leak @physical_function in virNodeDeviceGe
* In comment #9 there is a detailed explanation of what's going on, but the summary is that the variable physical_function (member of a PCI structure), of type _virPCIDeviceAd
* The impact of the issue is a memory leak usually small but that may grow bigger depending on the amount of PCI devices and how/when they are enumerated by libvirt; if some user of those functions are actively exercising the leak path it may become a problem (OOM situation).
[Test Case]
* The basic testing done to exercise the memory leak path was running the virsh tool to generate the XML output of a SR-IOV PCI device in a loop, like:
while true; do virsh nodedev-dumpxml pci_0000_08_12_0 >/dev/null; done
* This was executed while Valgrind was used to debug libvirtd, in order to collect the signature of the leak. Without the patch we get the "definitely lost" type of leak with the PCI backtrace (on comment #9), whereas with the patch we don't see the leak anymore.
[Regression Potential]
* The potential of regressions is really low - the fix is upstream for a while and in Focal package, and it is self-contained and not intrusive. Considering hypothetical scenarios, if there's an issue with the fix it should come in form of unused memory or double-free (which is usually harmless), and only in PCI enumeration (or PCI XML generation) paths.
Changed in cloud-archive: | |
assignee: | nobody → Guilherme G. Piccoli (gpiccoli) |
status: | New → Confirmed |
summary: |
- Memory leak on libvirt 1.3.1 + Memory leak of struct _virPCIDeviceAddress on libvirt |
Changed in libvirt (Ubuntu Focal): | |
status: | Confirmed → Fix Released |
Changed in cloud-archive: | |
status: | Confirmed → In Progress |
Changed in cloud-archive: | |
status: | In Progress → Fix Committed |
Changed in cloud-archive: | |
status: | Fix Committed → Fix Released |
Leak #1:
==12891== 80 bytes in 1 blocks are definitely lost in loss record 949 of 1,360 valgrind/ vgpreload_ memcheck- amd64-linux. so)
==12891== at 0x4C2CC70: calloc (in /usr/lib/
==12891== by 0x50A5299: virAlloc (viralloc.c:144)
==12891== by 0x50C1C63: virLastErrorObject (virerror.c:240)
==12891== by 0x50C4F88: virResetLastError (virerror.c:412)
==12891== by 0x51B786C: virConnectOpen (libvirt.c:1139)
From pahole:
struct _virError {
virErrorLevel level; /* 0x10 0x4 */
int code; /* 0 0x4 */
int domain; /* 0x4 0x4 */
char * message; /* 0x8 0x8 */
/* XXX 4 bytes hole, try to pack */
char * str1; /* 0x28 0x8 */
char * str2; /* 0x30 0x8 */
char * str3; /* 0x38 0x8 */
/* --- cacheline 1 boundary (64 bytes) --- */
int int1; /* 0x40 0x4 */
int int2; /* 0x44 0x4 */
/* size: 80, cachelines: 2, members: 12 */
/* sum members: 76, holes: 1, sum holes: 4 */
/* last cacheline: 16 bytes */
};
The _virError struct (in the form of virErrorPtr typedef) is expected to be freed ata(), which is a thread "destructor" set in the pthread
in virLastErrFreeD
creation. It should be called when thread exists (by pthread_exit() or something
analog), but can be skipped if process main() function returns.
So, hypothesis here are:
1) There was a process exist that led to this thread data getting leaked
2) Valgrind should be ended (with SIGTERM) in order to collect the thread
destructor execution, so this is a false/temporary leak.