Comment 5 for bug 1874464

Revision history for this message
Nisalon Caje (nisalon-caje) wrote :

I have the exact same bug on focal.

But it happens randomly to me, while my server is under load.
2 of the servers I migrated to focal have the same issue (not in the same DC), so it excludes a hardware issue from ths particular machine.

Here is my syslog
May 26 10:04:15 service01K kernel: [161735.901135] ------------[ cut here ]------------
May 26 10:04:15 service01K kernel: [161735.901136] NETDEV WATCHDOG: eth1 (ixgbe): transmit queue 2 timed out
May 26 10:04:15 service01K kernel: [161735.901145] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:447 dev_watchdog+0x258/0x260
May 26 10:04:15 service01K kernel: [161735.901146] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport isofs ip6table_filter ip6_tables xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif kvm_intel kvm joydev input_leds ipmi_si ipmi_devintf ipmi_msghandler video acpi_pad acpi_tad sch_fq_codel ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear hid_generic uas usbhid hid usb_storage raid1 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt crypto_simd fb_sys_fops ixgbe cryptd glue_helper nvme drm xfrm_algo ahci dca mdio libahci nvme_core
May 26 10:04:15 service01K kernel: [161735.901163] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.4.0-31-generic #35-Ubuntu
May 26 10:04:15 service01K kernel: [161735.901163] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./E3C246D4U2-2T, BIOS L2.02K 12/18/2019
May 26 10:04:15 service01K kernel: [161735.901164] RIP: 0010:dev_watchdog+0x258/0x260
May 26 10:04:15 service01K kernel: [161735.901165] Code: 85 c0 75 e5 eb 9f 4c 89 ff c6 05 ef f6 e7 00 01 e8 6d bb fa ff 44 89 e9 4c 89 fe 48 c7 c7 40 73 43 ba 48 89 c2 e8 03 30 71 ff <0f> 0b eb 80 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 d7
May 26 10:04:15 service01K kernel: [161735.901165] RSP: 0018:ffffb8774003ce30 EFLAGS: 00010286
May 26 10:04:15 service01K kernel: [161735.901166] RAX: 0000000000000000 RBX: ffff891bdf924ec0 RCX: 0000000000000006
May 26 10:04:15 service01K kernel: [161735.901166] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff891bee8578c0
May 26 10:04:15 service01K kernel: [161735.901167] RBP: ffffb8774003ce60 R08: 000000000000046b R09: 0000000000000004
May 26 10:04:15 service01K kernel: [161735.901167] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000040
May 26 10:04:15 service01K kernel: [161735.901167] R13: 0000000000000002 R14: ffff891bdf980480 R15: ffff891bdf980000
May 26 10:04:15 service01K kernel: [161735.901168] FS: 0000000000000000(0000) GS:ffff891bee840000(0000) knlGS:0000000000000000
May 26 10:04:15 service01K kernel: [161735.901168] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 26 10:04:15 service01K kernel: [161735.901169] CR2: 00007fb0077d6148 CR3: 0000000894f1c006 CR4: 00000000003606e0
May 26 10:04:15 service01K kernel: [161735.901169] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 26 10:04:15 service01K kernel: [161735.901169] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
May 26 10:04:15 service01K kernel: [161735.901170] Call Trace:
May 26 10:04:15 service01K kernel: [161735.901171] <IRQ>
May 26 10:04:15 service01K kernel: [161735.901173] ? pfifo_fast_enqueue+0x150/0x150
May 26 10:04:15 service01K kernel: [161735.901175] call_timer_fn+0x32/0x130
May 26 10:04:15 service01K kernel: [161735.901176] __run_timers.part.0+0x180/0x280
May 26 10:04:15 service01K kernel: [161735.901177] ? enqueue_hrtimer+0x3d/0x90
May 26 10:04:15 service01K kernel: [161735.901178] ? recalibrate_cpu_khz+0x10/0x10
May 26 10:04:15 service01K kernel: [161735.901179] ? ktime_get+0x3e/0xa0
May 26 10:04:15 service01K kernel: [161735.901180] run_timer_softirq+0x2a/0x50
May 26 10:04:15 service01K kernel: [161735.901181] __do_softirq+0xe1/0x2d6
May 26 10:04:15 service01K kernel: [161735.901182] ? hrtimer_interrupt+0x13b/0x220
May 26 10:04:15 service01K kernel: [161735.901183] irq_exit+0xae/0xb0
May 26 10:04:15 service01K kernel: [161735.901184] smp_apic_timer_interrupt+0x7b/0x140
May 26 10:04:15 service01K kernel: [161735.901185] apic_timer_interrupt+0xf/0x20
May 26 10:04:15 service01K kernel: [161735.901186] </IRQ>
May 26 10:04:15 service01K kernel: [161735.901187] RIP: 0010:cpuidle_enter_state+0xc5/0x450
May 26 10:04:15 service01K kernel: [161735.901187] Code: ff e8 9f 04 81 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 65 03 00 00 31 ff e8 f2 74 87 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8f 02 00 00 49 63 cd 4c 8b 7d d0 4c 2b 7d c8 48 8d
May 26 10:04:15 service01K kernel: [161735.901188] RSP: 0018:ffffb877400e3e38 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
May 26 10:04:15 service01K kernel: [161735.901188] RAX: ffff891bee86ad00 RBX: ffffffffba759dc0 RCX: 000000000000001f
May 26 10:04:15 service01K kernel: [161735.901189] RDX: 0000000000000000 RSI: 00000000258eeee5 RDI: 0000000000000000
May 26 10:04:15 service01K kernel: [161735.901189] RBP: ffffb877400e3e78 R08: 0000931912eef131 R09: 00000000000000b0
May 26 10:04:15 service01K kernel: [161735.901189] R10: ffff891bee869a00 R11: ffff891bee8699e0 R12: ffff891bee875a20
May 26 10:04:15 service01K kernel: [161735.901190] R13: 0000000000000001 R14: 0000000000000001 R15: ffff891bee875a20
May 26 10:04:15 service01K kernel: [161735.901191] ? cpuidle_enter_state+0xa1/0x450
May 26 10:04:15 service01K kernel: [161735.901191] cpuidle_enter+0x2e/0x40
May 26 10:04:15 service01K kernel: [161735.901192] call_cpuidle+0x23/0x40
May 26 10:04:15 service01K kernel: [161735.901193] do_idle+0x1dd/0x270
May 26 10:04:15 service01K kernel: [161735.901194] cpu_startup_entry+0x20/0x30
May 26 10:04:15 service01K kernel: [161735.901195] start_secondary+0x167/0x1c0
May 26 10:04:15 service01K kernel: [161735.901196] secondary_startup_64+0xa4/0xb0
May 26 10:04:15 service01K kernel: [161735.901198] ---[ end trace 8367c52cc2c9c7ea ]---
May 26 10:04:15 service01K kernel: [161735.901200] ixgbe 0000:04:00.1 eth1: initiating reset due to tx timeout
May 26 10:04:15 service01K kernel: [161735.901238] ixgbe 0000:04:00.1 eth1: Reset adapter
May 26 10:04:18 service01K systemd-networkd[2915505]: eth1: Lost carrier
May 26 10:04:19 service01K ntpd[1007]: Deleting interface #4 eth1, 192.168.52.129#123, interface stats: received=0, sent=0, dropped=0, active_time=161720 secs
May 26 10:04:19 service01K ntpd[1007]: Deleting interface #8 eth1, fe80::d250:99ff:fed6:91ef%3#123, interface stats: received=0, sent=0, dropped=0, active_time=161720 secs
May 26 10:04:29 service01K systemd-networkd[2915505]: eth1: Gained carrier
May 26 10:04:29 service01K kernel: [161749.910201] ixgbe 0000:04:00.1 eth1: NIC Link is Up 10 Gbps, Flow Control: None
May 26 10:04:30 service01K ntpd[1007]: Listen normally on 9 eth1 192.168.52.129:123
May 26 10:04:30 service01K ntpd[1007]: Listen normally on 10 eth1 [fe80::d250:99ff:fed6:91ef%3]:123
May 26 10:04:30 service01K ntpd[1007]: new interface(s) found: waking up resolver