Appears running: F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 4 0 26330 26263 20 0 7588 980 - R+ pts/2 33:52 \_ ethtool -L eth1 combined 3
All that touches it seems to get affected, so e.g. a ltrace/strace get stuck as well.
Meanwhile the log on virsh console of the guest goes towards soft lockups: [ 568.394870] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ethtool:26330] [ 575.418868] INFO: rcu_sched self-detected stall on CPU [ 575.419674] 0-...: (14999 ticks this GP) idle=66d/140000000000001/0 softirq=21127/21127 fqs=14994 [ 575.420779] (t=15000 jiffies g=11093 c=11092 q=9690)
More Info in the journal: NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ethtool:26330] Modules linked in: openvswitch nf_defrag_ipv6 nf_conntrack isofs ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul parport_pc parport joydev serio_raw iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd floppy CPU: 0 PID: 26330 Comm: ethtool Not tainted 4.4.0-18-generic #34-Ubuntu Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff8801b747d280 ti: ffff8800ba58c000 task.ti: ffff8800ba58c000 RIP: 0010:[<ffffffff815f1a43>] [<ffffffff815f1a43>] virtnet_send_command+0xf3/0x150 RSP: 0018:ffff8800ba58fb60 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffff8800bba62840 RCX: ffff8801b64a9000 RDX: 000000000000c010 RSI: ffff8800ba58fb64 RDI: ffff8800bba6c400 RBP: ffff8800ba58fbf8 R08: 0000000000000004 R09: ffff8801b9001b00 R10: ffff8801b671b080 R11: 0000000000000246 R12: 0000000000000002 R13: ffff8800ba58fb88 R14: 0000000000000000 R15: 0000000000000004 FS: 00007fb57d56c700(0000) GS:ffff8801bfc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb57cd7b680 CR3: 00000000ba85a000 CR4: 00000000001406f0 Stack: ffff8800ba58fc28 ffffea0002ee9882 0000000200000940 0000000000000000 0000000000000000 ffffea0002ee9882 0000000100000942 0000000000000000 0000000000000000 ffff8800ba58fb68 ffff8800ba58fc10 ffff8800ba58fb88 Call Trace: [<ffffffff815f1d9a>] virtnet_set_queues+0x9a/0x100 [<ffffffff815f1e52>] virtnet_set_channels+0x52/0xa0 [<ffffffff8171fc3c>] ethtool_set_channels+0xfc/0x140 [<ffffffff81720afd>] dev_ethtool+0x40d/0x1d70 [<ffffffff811cafc5>] ? page_add_file_rmap+0x25/0x60 [<ffffffff8172f8d5>] ? __rtnl_unlock+0x15/0x20 [<ffffffff8171ec61>] ? netdev_run_todo+0x61/0x320 [<ffffffff8118d8a9>] ? unlock_page+0x69/0x70 [<ffffffff81733b42>] dev_ioctl+0x182/0x580 [<ffffffff811bf9f4>] ? handle_mm_fault+0xe44/0x1820 [<ffffffff816fb932>] sock_do_ioctl+0x42/0x50 [<ffffffff816fbe32>] sock_ioctl+0x1d2/0x290 [<ffffffff8121ff9f>] do_vfs_ioctl+0x29f/0x490 [<ffffffff8106b554>] ? __do_page_fault+0x1b4/0x400 [<ffffffff81220209>] SyS_ioctl+0x79/0x90 [<ffffffff818243b2>] entry_SYSCALL_64_fastpath+0x16/0x71 Code: 44 89 e2 4c 89 6c c5 b0 e8 3b dc ec ff 48 8b 7b 08 e8 f2 db ec ff 84 c0 75 11 eb 24 48 8b 7b 08 e8 d3 d6 ec ff 84 c0 75 17 f3 90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 4d e0 ec ff 48 85 c0 74 dc
Sometimes there is this on top [<ffffffff815f1a53>] ? virtnet_send_command+0x103/0x150
Need to check if there is a loop in virtnet_set_queues that could call virtnet_send_command infinitely.
Being stuck in the kernel explains why signals and traces can't attach.
Note - we are already on todays: Linux guest-virtio-dpdk 4.4.0-18-generic #34-Ubuntu SMP Wed Apr 6 14:01:02 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
I seem to be able to work on old ssh sessions, but new sessions get stuck as well - need to prepare more next time :-)
Next Steps: - analyze code pointed out by hangs
Appears running:
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 26330 26263 20 0 7588 980 - R+ pts/2 33:52 \_ ethtool -L eth1 combined 3
All that touches it seems to get affected, so e.g. a ltrace/strace get stuck as well.
Meanwhile the log on virsh console of the guest goes towards soft lockups: 140000000000001 /0 softirq=21127/21127 fqs=14994
[ 568.394870] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ethtool:26330]
[ 575.418868] INFO: rcu_sched self-detected stall on CPU
[ 575.419674] 0-...: (14999 ticks this GP) idle=66d/
[ 575.420779] (t=15000 jiffies g=11093 c=11092 q=9690)
More Info in the journal: iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd floppy 1.8.2-1ubuntu1 04/01/2014 ffffffff815f1a4 3>] [<ffffffff815f1 a43>] virtnet_ send_command+ 0xf3/0x150 58fb60 EFLAGS: 00000246 0(0000) GS:ffff8801bfc0 0000(0000) knlGS:000000000 0000000 f1d9a>] virtnet_ set_queues+ 0x9a/0x100 f1e52>] virtnet_ set_channels+ 0x52/0xa0 1fc3c>] ethtool_ set_channels+ 0xfc/0x140 20afd>] dev_ethtool+ 0x40d/0x1d70 cafc5>] ? page_add_ file_rmap+ 0x25/0x60 2f8d5>] ? __rtnl_ unlock+ 0x15/0x20 1ec61>] ? netdev_ run_todo+ 0x61/0x320 8d8a9>] ? unlock_ page+0x69/ 0x70 33b42>] dev_ioctl+ 0x182/0x580 bf9f4>] ? handle_ mm_fault+ 0xe44/0x1820 fb932>] sock_do_ ioctl+0x42/ 0x50 fbe32>] sock_ioctl+ 0x1d2/0x290 1ff9f>] do_vfs_ ioctl+0x29f/ 0x490 6b554>] ? __do_page_ fault+0x1b4/ 0x400 20209>] SyS_ioctl+0x79/0x90 243b2>] entry_SYSCALL_ 64_fastpath+ 0x16/0x71
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [ethtool:26330]
Modules linked in: openvswitch nf_defrag_ipv6 nf_conntrack isofs ppdev kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul parport_pc parport joydev serio_raw iscsi_tcp libiscsi_tcp libiscsi scsi_transport_
CPU: 0 PID: 26330 Comm: ethtool Not tainted 4.4.0-18-generic #34-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-
task: ffff8801b747d280 ti: ffff8800ba58c000 task.ti: ffff8800ba58c000
RIP: 0010:[<
RSP: 0018:ffff8800ba
RAX: 0000000000000000 RBX: ffff8800bba62840 RCX: ffff8801b64a9000
RDX: 000000000000c010 RSI: ffff8800ba58fb64 RDI: ffff8800bba6c400
RBP: ffff8800ba58fbf8 R08: 0000000000000004 R09: ffff8801b9001b00
R10: ffff8801b671b080 R11: 0000000000000246 R12: 0000000000000002
R13: ffff8800ba58fb88 R14: 0000000000000000 R15: 0000000000000004
FS: 00007fb57d56c70
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb57cd7b680 CR3: 00000000ba85a000 CR4: 00000000001406f0
Stack:
ffff8800ba58fc28 ffffea0002ee9882 0000000200000940 0000000000000000
0000000000000000 ffffea0002ee9882 0000000100000942 0000000000000000
0000000000000000 ffff8800ba58fb68 ffff8800ba58fc10 ffff8800ba58fb88
Call Trace:
[<ffffffff815
[<ffffffff815
[<ffffffff817
[<ffffffff817
[<ffffffff811
[<ffffffff817
[<ffffffff817
[<ffffffff811
[<ffffffff817
[<ffffffff811
[<ffffffff816
[<ffffffff816
[<ffffffff812
[<ffffffff810
[<ffffffff812
[<ffffffff818
Code: 44 89 e2 4c 89 6c c5 b0 e8 3b dc ec ff 48 8b 7b 08 e8 f2 db ec ff 84 c0 75 11 eb 24 48 8b 7b 08 e8 d3 d6 ec ff 84 c0 75 17 f3 90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 4d e0 ec ff 48 85 c0 74 dc
Sometimes there is this on top 1a53>] ? virtnet_ send_command+ 0x103/0x150
[<ffffffff815f
Need to check if there is a loop in virtnet_set_queues that could call virtnet_ send_command infinitely.
Being stuck in the kernel explains why signals and traces can't attach.
Note - we are already on todays:
Linux guest-virtio-dpdk 4.4.0-18-generic #34-Ubuntu SMP Wed Apr 6 14:01:02 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
I seem to be able to work on old ssh sessions, but new sessions get stuck as well - need to prepare more next time :-)
Next Steps:
- analyze code pointed out by hangs