[Zesty] libnl3 Segmentation fault in sriov environments

Bug #1673491 reported by Talat Batheesh
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libnl3 (Ubuntu)
Fix Released
Critical
Unassigned
libvirt (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

in Ubuntu 17.04 we can't open a virt-manager, we see sefault

This issue happen when we update from kernel 4.10.0-9-generic to 4.10.0-11-generic

root@:~# /usr/sbin/libvirtd
Segmentation fault (core dumped)

root@:~# gdb /usr/sbin/libvirtd ./core
GNU gdb (Ubuntu 7.12.50.20170314-0ubuntu1) 7.12.50.20170314-git
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/libvirtd...(no debugging symbols found)...done.
[New LWP 4232]
[New LWP 4216]
[New LWP 4219]
[New LWP 4218]
[New LWP 4231]
[New LWP 4215]
[New LWP 4220]
[New LWP 4224]
[New LWP 4221]
[New LWP 4223]
[New LWP 4222]
[New LWP 4217]
[New LWP 4225]
[New LWP 4227]
[New LWP 4230]
[New LWP 4229]
[New LWP 4228]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/sbin/libvirtd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f97fa93367f in rtnl_link_sriov_parse_vflist () from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200
[Current thread is 1 (Thread 0x7f97dff94700 (LWP 4232))]
(gdb) where
#0 0x00007f97fa93367f in rtnl_link_sriov_parse_vflist () from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200
#1 0x00007f97fa927fad in ?? () from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200
#2 0x00007f980bfb76c3 in nl_cache_parse () from /lib/x86_64-linux-gnu/libnl-3.so.200
#3 0x00007f980bfb770b in ?? () from /lib/x86_64-linux-gnu/libnl-3.so.200
#4 0x00007f980bfbdc1c in nl_recvmsgs_report () from /lib/x86_64-linux-gnu/libnl-3.so.200
#5 0x00007f980bfbe049 in nl_recvmsgs () from /lib/x86_64-linux-gnu/libnl-3.so.200
#6 0x00007f980bfb6aab in ?? () from /lib/x86_64-linux-gnu/libnl-3.so.200
#7 0x00007f980bfb763d in nl_cache_pickup () from /lib/x86_64-linux-gnu/libnl-3.so.200
#8 0x00007f980bfb7871 in nl_cache_refill () from /lib/x86_64-linux-gnu/libnl-3.so.200
#9 0x00007f97fa926975 in rtnl_link_alloc_cache_flags () from /usr/lib/x86_64-linux-gnu/libnl-route-3.so.200
#10 0x00007f97fb224aff in ?? () from /usr/lib/x86_64-linux-gnu/libnetcf.so.1
#11 0x00007f97fb225e57 in ?? () from /usr/lib/x86_64-linux-gnu/libnetcf.so.1
#12 0x00007f97fb43699a in ?? () from /usr/lib/libvirt/connection-driver/libvirt_driver_interface.so
#13 0x00007f980d1414cf in virStateInitialize () from /usr/lib/x86_64-linux-gnu/libvirt.so.0
#14 0x000055b60ab0603b in ?? ()
#15 0x00007f980d09f892 in ?? () from /usr/lib/x86_64-linux-gnu/libvirt.so.0
#16 0x00007f980c9ab6ca in start_thread (arg=0x7f97dff94700) at pthread_create.c:333
#17 0x00007f980c6e50ff in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
(gdb) quit

root@:~# cat /etc/os-release
NAME="Ubuntu"
VERSION="17.04 (Zesty Zapus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu Zesty Zapus (development branch)"
VERSION_ID="17.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=zesty
UBUNTU_CODENAME=zesty
root@:~# uname -r
4.10.0-11-generic

dmesg
dmesg

Mar 16 16:26:46 kernel: [179539.047407] virt-manager[32956]: segfault at 0 ip 00007f7d98765889 sp 00007ffe38c41d90 error 4 in libgtk-3.so.0.2200.7[7f7d98482000+700000]
Mar 16 16:27:00 ntpd[2994]: message repeated 2 times: [ Soliciting pool server 109.226.40.40]
Mar 16 16:27:01 ntpd[2994]: Soliciting pool server 2402:f000:1:416:166:111:206:172
Mar 16 16:27:23 ntpd[2994]: Soliciting pool server 2001:67c:1560:8003::c7
Mar 16 16:27:38 ntpd[2994]: Soliciting pool server 109.226.40.40

Mar 16 16:33:16 kernel: [ 25.268014] libvirtd[3045]: segfault at 7fe8218ca9e0 ip 00007fe83c26867f sp 00007fe8218c84c0 error 7 in libnl-route-3.so.200.24.0[7fe83c23d000+6b000]
Mar 16 16:33:16 systemd[1]: libvirtd.service: Main process exited, code=killed, status=11/SEGV
Mar 16 16:33:16 systemd[1]: libvirtd.service: Unit entered failed state.
Mar 16 16:33:16 systemd[1]: libvirtd.service: Failed with result 'signal'.
Mar 16 16:33:16 nis[2877]: ...done.
Mar 16 16:33:16 systemd[1]: Started LSB: Start NIS client and server daemons..
Mar 16 16:33:16 systemd[1]: Starting Automounts filesystems on demand...
Mar 16 16:33:16 systemd[1]: libvirtd.service: Service hold-off time over, scheduling restart.
Mar 16 16:33:16 systemd[1]: apparmor.service: Cannot add dependency job, ignoring: Unit apparmor.service is masked.
Mar 16 16:33:16 systemd[1]: Stopped Virtualization daemon.
Mar 16 16:33:16 systemd[1]: Starting Virtualization daemon...
Mar 16 16:33:16 systemd[1]: Started Virtualization daemon.
Mar 16 16:33:16 kernel: [ 25.491389] libvirtd[3092]: segfault at 7f7a4c9309e0 ip 00007f7a672ce67f sp 00007f7a4c92e4c0 error 7 in libnl-route-3.so.200.24.0[7f7a672a3000+6b000]
Mar 16 16:33:16 systemd[1]: libvirtd.service: Main process exited, code=killed, status=11/SEGV
Mar 16 16:33:16 systemd[1]: libvirtd.service: Unit entered failed state.
Mar 16 16:33:16 systemd[1]: libvirtd.service: Failed with result 'signal'.
Mar 16 16:33:16 sm-mta[3021]: gethostbyaddr(63.1.2.64) failed: 1
Mar 16 16:33:16 systemd[1]: libvirtd.service: Service hold-off time over, scheduling restart.
Mar 16 16:33:16 systemd[1]: apparmor.service: Cannot add dependency job, ignoring: Unit apparmor.service is masked.
Mar 16 16:33:16 systemd[1]: Stopped Virtualization daemon.
Mar 16 16:33:16 systemd[1]: Starting Virtualization daemon...
Mar 16 16:33:16 systemd[1]: Started Virtualization daemon.
Mar 16 16:33:16 systemd[1]: Started Automounts filesystems on demand.
Mar 16 16:33:16 systemd[1]: libvirtd.service: Main process exited, code=killed, status=11/SEGV
Mar 16 16:33:16 systemd[1]: libvirtd.service: Unit entered failed state.
Mar 16 16:33:16 systemd[1]: libvirtd.service: Failed with result 'signal'.
Mar 16 16:33:16 kernel: [ 25.689266] libvirtd[3121]: segfault at 7f0022f459e0 ip 00007f003d8e367f sp 00007f0022f434c0 error 7 in libnl-route-3.so.200.24.0[7f003d8b8000+6b000]
Mar 16 16:33:16 sm-mta[3021]: gethostbyaddr(64.2.2.64) failed: 1
Mar 16 16:33:16 sm-mta[3126]: starting daemon (8.15.2): SMTP+queueing@00:10:00
Mar 16 16:33:16 systemd[1]: libvirtd.service: Service hold-off time over, scheduling restart.
Mar 16 16:33:16 systemd[1]: apparmor.service: Cannot add dependency job, ignoring: Unit apparmor.service is masked.
Mar 16 16:33:16 systemd[1]: Stopped Virtualization daemon.
Mar 16 16:33:16 systemd[1]: Starting Virtualization daemon...
Mar 16 16:33:16 systemd[1]: Started Virtualization daemon.
Mar 16 16:33:16 systemd[1]: libvirtd.service: Main process exited, code=killed, status=11/SEGV
Mar 16 16:33:16 systemd[1]: libvirtd.service: Unit entered failed state.
Mar 16 16:33:16 systemd[1]: libvirtd.service: Failed with result 'signal'.
Mar 16 16:33:16 kernel: [ 26.124935] libvirtd[3144]: segfault at 7f1eff5c39e0 ip 00007f1f19f6167f sp 00007f1eff5c14c0 error 7 in libnl-route-3.so.200.24.0[7f1f19f36000+6b000]
Mar 16 16:33:17 ntpd[2923]: Soliciting pool server 109.226.40.40
Mar 16 16:33:17 systemd[1]: libvirtd.service: Service hold-off time over, scheduling restart.
Mar 16 16:33:17 systemd[1]: apparmor.service: Cannot add dependency job, ignoring: Unit apparmor.service is masked.
Mar 16 16:33:17 systemd[1]: Stopped Virtualization daemon.
Mar 16 16:33:17 systemd[1]: Starting Virtualization daemon...
Mar 16 16:33:17 systemd[1]: Started Virtualization daemon.
Mar 16 16:33:17 kernel: [ 26.563048] libvirtd[3164]: segfault at 7fc19895c9e0 ip 00007fc1b32fa67f sp 00007fc19895a4c0 error 7 in libnl-route-3.so.200.24.0[7fc1b32cf000+6b000]
Mar 16 16:33:17 systemd[1]: libvirtd.service: Main process exited, code=killed, status=11/SEGV
Mar 16 16:33:17 systemd[1]: libvirtd.service: Unit entered failed state.
Mar 16 16:33:17 systemd[1]: libvirtd.service: Failed with result 'signal'.
Mar 16 16:33:17 systemd[1]: libvirtd.service: Service hold-off time over, scheduling restart.

root@:~# dpkg --list |grep libnl
ii libnl-3-200:amd64 3.2.29-0ubuntu1 amd64 library for dealing with netlink sockets
ii libnl-genl-3-200:amd64 3.2.29-0ubuntu1 amd64 library for dealing with netlink sockets - generic netlink
ii libnl-route-3-200:amd64 3.2.29-0ubuntu1 amd64 library for dealing with netlink sockets - route interface

see this github commits. i think missed function rtnl_link_sriov_parse_vflist that exist in libnl3_2_29
https://github.com/thom311/libnl/issues/126

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Trigger is libvirt (or others following the github issue) but fix would be in libnl, so adding that.

Changed in libnl3 (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Talat, thank you for your report and also pulling out the github issue already!

tags: added: server-next
Revision history for this message
Talat Batheesh (talat-b87) wrote :

Hi,

Thanks for taking a look at the bug and taking the time to help improve the package.
This issue is a show stopper for our products, I would appreciate it if you would prepare a package with the fix that we can test and verify the fix.

Thanks,
Talat

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

We are ahead of Debian, so we won't be able to sync a fix atm.
Also I have no upload rights to fully fix it.

The merge has to commits:
* faf9d90 sriov: merge branch 'sriov-crash-issue126'
| * 20ed636 sriov: avoid buffer overrun in rtnl_link_sriov_parse_vflist()
| * 2d11f40 sriov: fix crash in rtnl_link_sriov_parse_vflist

I prepared a fix that you could test in https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/2603

Dep8 tests still running there.
Also I'd like to hear Talats verify on the ppa then it should be considered as a zesty fix.

@xnox you did the upload of the base version, do you think you could take a look and sponsor the bileto ticket if it is appropriate?

summary: - [Zesty] libvirtd Segmentation fault after running virt-manager
+ [Zesty] libnl3 Segmentation fault in sriov environments
Revision history for this message
Talat Batheesh (talat-b87) wrote :

Thank you,
The fix is working, i validate it with machine that has sriov environment.
Dimitri, could you please add this fixes to the zesty release ?

Thanks,
Talat

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libnl3 - 3.2.29-0ubuntu2

---------------
libnl3 (3.2.29-0ubuntu2) zesty; urgency=medium

  * Fix crash in sriov environments (LP: #1673491)
    Applying patches of upstream issue #126, see the link for more:
    https://github.com/thom311/libnl/issues/126
    - d/p/u/sriov-avoid-buffer-overrun-in-rtnl_link_sriov_parse_vflist.patch
    - d/p/u/sriov-fix-crash-in-rtnl_link_sriov_parse_vflist.patch

 -- Christian Ehrhardt <email address hidden> Fri, 17 Mar 2017 10:12:17 +0100

Changed in libnl3 (Ubuntu):
status: New → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Dimitri was not available, but I found another sponsor.
Thanks APW!

@Talat if you could finally post if the version that is released is good as well that would be nice.

Revision history for this message
Talat Batheesh (talat-b87) wrote :

Thank you ,
The update is passed and libnl updated to libnl-3-200:amd64 (3.2.29-0ubuntu2).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Since libvirt was only the trigger, taking that off the tasks here.

Changed in libvirt (Ubuntu):
status: New → Invalid
tags: removed: server-next
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.