2021-09-14 06:06:11 |
Matthew Ruffell |
description |
The latest libvirtd (6.0.0-0ubuntu8.13) crashes when trying to bring up network pools with the stacktrace below. I tracked down the problem to the newly added patch (lp-1892132-Add-phys_port_name-support-on-virPCIGetNetName.patch). Assigning *netname = firstEntryName; ends up in memory corruption. Looking at the mainline, I changed it to the following:
*netname = g_steal_pointer(&firstEntryName);
or you can just do
firstEntryName = NULL;
Both will solve the problem.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f40e5d1c859 in __GI_abort () at abort.c:79
#2 0x00007f40e5d873ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f40e5eb1285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007f40e5d8f47c in malloc_printerr (str=str@entry=0x7f40e5eb35d0 "free(): double free detected in tcache 2") at malloc.c:5347
#4 0x00007f40e5d910ed in _int_free (av=0x7f40c8000020, p=0x7f40c80079e0, have_lock=0) at malloc.c:4201
#5 0x00007f40e61a9a4f in virFree (ptrptr=0x7f40c8003b60) at ../../../src/util/viralloc.c:348
#6 0x00007f40dd0cf8b1 in networkCreateInterfacePool (netdef=0x7f40840187f0) at ../../../src/network/bridge_driver.c:2849
#7 0x00007f40dd0d799c in networkStartNetworkExternal (obj=0x7f408400f720) at ../../../src/network/bridge_driver.c:2938
#8 networkStartNetwork (driver=driver@entry=0x7f408400a7a0, obj=0x7f408400f720) at ../../../src/network/bridge_driver.c:2938
#9 0x00007f40dd0d854d in networkCreate (net=0x7f40c8000c60) at ../../../src/network/bridge_driver.c:4013
#10 0x00007f40e63fac3f in virNetworkCreate (network=network@entry=0x7f40c8000c60) at ../../../src/libvirt-network.c:585
#11 0x0000560240e255d1 in remoteDispatchNetworkCreate (server=0x560240ea4280, msg=0x560240ee8200, args=0x7f40c8000c40, rerr=0x7f40e00ec9a0, client=<optimized out>) at ./remote/remote_daemon_dispatch_stubs.h:13570
#12 remoteDispatchNetworkCreateHelper (server=0x560240ea4280, client=<optimized out>, msg=0x560240ee8200, rerr=0x7f40e00ec9a0, args=0x7f40c8000c40, ret=0x0) at ./remote/remote_daemon_dispatch_stubs.h:13549
#13 0x00007f40e630c970 in virNetServerProgramDispatchCall (msg=0x560240ee8200, client=0x560240eea270, server=0x560240ea4280, prog=0x560240ee1520) at ../../../src/rpc/virnetserverprogram.c:430
#14 virNetServerProgramDispatch (prog=0x560240ee1520, server=server@entry=0x560240ea4280, client=0x560240eea270, msg=0x560240ee8200) at ../../../src/rpc/virnetserverprogram.c:302
#15 0x00007f40e6311c2c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x560240ea4280) at ../../../src/rpc/virnetserver.c:136
#16 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x560240ea4280) at ../../../src/rpc/virnetserver.c:153
#17 0x00007f40e62301af in virThreadPoolWorker (opaque=opaque@entry=0x560240e885f0) at ../../../src/util/virthreadpool.c:163
#18 0x00007f40e622f51c in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:196
#19 0x00007f40e5ef2609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007f40e5e19293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 |
[Impact]
A regression was introduced in libvirt 6.0.0-0ubuntu8.13 for Focal, that affects users who use SR-IOV to pass through VF devices to KVM guests.
The problem was introduced in the recent lp-1892132-Add-phys_port_name-support-on-virPCIGetNetName.patch patch, which changes how virPCIGetNetName() fetches the name of the underlying VF device, so it can be used to send netlink commands.
There is a fallback case where we record the name of the device at the beginning, and if we fail all other lookups, we simply return the beginning name.
In libvirt 6.0.0-0ubuntu8.13, a line to drop the reference to firstEntryName was dropped incorrectly:
- if (firstEntryName) {
- *netname = firstEntryName;
- firstEntryName = NULL;
- ret = 0;
+ if (firstEntryName) {
+ *netname = firstEntryName;
+ ret = 0;
This results in a double free, as netname and firstEntryName are freed, and results in the gdb trace:
#1 0x00007f40e5d1c859 in __GI_abort () at abort.c:79
#2 0x00007f40e5d873ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f40e5eb1285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007f40e5d8f47c in malloc_printerr (str=str@entry=0x7f40e5eb35d0 "free(): double free detected in tcache 2") at malloc.c:5347
#4 0x00007f40e5d910ed in _int_free (av=0x7f40c8000020, p=0x7f40c80079e0, have_lock=0) at malloc.c:4201
#5 0x00007f40e61a9a4f in virFree (ptrptr=0x7f40c8003b60) at ../../../src/util/viralloc.c:348
#6 0x00007f40dd0cf8b1 in networkCreateInterfacePool (netdef=0x7f40840187f0) at ../../../src/network/bridge_driver.c:2849
#7 0x00007f40dd0d799c in networkStartNetworkExternal (obj=0x7f408400f720) at ../../../src/network/bridge_driver.c:2938
#8 networkStartNetwork (driver=driver@entry=0x7f408400a7a0, obj=0x7f408400f720) at ../../../src/network/bridge_driver.c:2938
#9 0x00007f40dd0d854d in networkCreate (net=0x7f40c8000c60) at ../../../src/network/bridge_driver.c:4013
#10 0x00007f40e63fac3f in virNetworkCreate (network=network@entry=0x7f40c8000c60) at ../../../src/libvirt-network.c:585
#11 0x0000560240e255d1 in remoteDispatchNetworkCreate (server=0x560240ea4280, msg=0x560240ee8200, args=0x7f40c8000c40, rerr=0x7f40e00ec9a0, client=<optimized out>) at ./remote/remote_daemon_dispatch_stubs.h:13570
#12 remoteDispatchNetworkCreateHelper (server=0x560240ea4280, client=<optimized out>, msg=0x560240ee8200, rerr=0x7f40e00ec9a0, args=0x7f40c8000c40, ret=0x0) at ./remote/remote_daemon_dispatch_stubs.h:13549
#13 0x00007f40e630c970 in virNetServerProgramDispatchCall (msg=0x560240ee8200, client=0x560240eea270, server=0x560240ea4280, prog=0x560240ee1520) at ../../../src/rpc/virnetserverprogram.c:430
#14 virNetServerProgramDispatch (prog=0x560240ee1520, server=server@entry=0x560240ea4280, client=0x560240eea270, msg=0x560240ee8200) at ../../../src/rpc/virnetserverprogram.c:302
#15 0x00007f40e6311c2c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x560240ea4280) at ../../../src/rpc/virnetserver.c:136
#16 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x560240ea4280) at ../../../src/rpc/virnetserver.c:153
#17 0x00007f40e62301af in virThreadPoolWorker (opaque=opaque@entry=0x560240e885f0) at ../../../src/util/virthreadpool.c:163
#18 0x00007f40e622f51c in virThreadHelper (data=<optimized out>) at ../../../src/util/virthread.c:196
#19 0x00007f40e5ef2609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007f40e5e19293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The fix is to either make sure that firstEntryName = NULL; like before, or we replace with the upstream call to g_steal_pointer(&firstEntryName); which does the same.
static inline gpointer
g_steal_pointer (gpointer pp)
{
gpointer *ptr = (gpointer *) pp;
gpointer ref;
ref = *ptr;
*ptr = NULL;
return ref;
}
[Testcase]
Deploy a machine with a NIC that supports SR-IOV. Note, only particular NICs will reach the end of virPCIGetNetName().
Install KVM stack:
$ sudo apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils
Edit /etc/default/grub and add "intel_iommu=on" to the kernel command line.
$ sudo update-grub
$ sudo reboot
Create the VFs via the sysfs node:
$ sudo -s
# cat /sys/class/net/eno49/device/sriov_totalvfs
63
# echo '7' > /sys/class/net/eno49/device/sriov_numvfs
Next we need to define a virsh network, save the following in /tmp/passthrough.xml, changing "eno49" to your network interface.
<network>
<name>passthrough</name>
<forward mode='hostdev' managed='yes'>
<pf dev='eno49'/>
</forward>
</network>
$ virsh net-define /tmp/passthrough.xml
$ virsh net-autostart passthrough
$ virsh net-start passthrough
We need to make an apparmor rule to enable vfio of our VF device.
Edit /etc/apparmor.d/local/abstractions/libvirt-qemu
Add the line:
/dev/vfio/* rw,
Then restart apparmor:
$ sudo systemctl restart apparmor.service
Next make a Focal VM:
$ sudo apt install uvtool-libvirt
$ ssh-keygen
$ uvt-simplestreams-libvirt sync release=focal arch=amd64
$ uvt-kvm create --cpu 4 --memory 4096 --disk 8 [ --password insecure ] focal-vm release=focal arch=amd64
$ uvt-kvm wait focal-vm
$ uvt-kvm ssh focal-vm # for ssh, key-based authentication.
$ virsh console focal-vm # for serial console, user ubuntu, password above.
Next, edit the virsh xml
$ virsh shutdown focal-vm
$ virsh edit focal-vm
Add:
<interface type='network'>
<source network='passthrough'>
</interface>
Save and reboot the VM.
$ virsh start focal-vm
[Where problems could occur]
If a regression were to occur, it would affect users who use SR-IOV to pass through VF devices into KVM guests, which is a large amount of our enterprise users.
The fix is a single line change, and simply replaces what was existing, but was mistakenly removed. The changes should be safe. |
|