Comment 0 for bug 1981658

Revision history for this message
Haw Loeung (hloeung) wrote :

Hi,

On one of the main US Ubuntu Archive servers, we decided to reboot into a HWE kernel. The latest being 5.4.0-122 but on doing so, ran into this kernel panic:

| [ 350.776585] BUG: kernel NULL pointer dereference, address: 0000000000000008
| [ 350.783674] #PF: supervisor read access in kernel mode
| [ 350.788846] #PF: error_code(0x0000) - not-present page
| [ 350.794019] PGD 0 P4D 0
| [ 350.796631] Oops: 0000 [#1] SMP NOPTI
| [ 350.800425] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 5.4.0-122-generic #138~18.04.1-Ubuntu
| [ 350.808918] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 02/10/2022
| [ 350.817666] RIP: 0010:tcp_create_openreq_child+0x2e1/0x3e0
| [ 350.823187] Code: 08 00 00 41 8b 84 24 18 01 00 00 48 c7 83 80 08 00 00 00 00 00 00 4c 89 e6 4c 89 ef 89 83 c4 05 00 00 49 8b 84 24 f8 00 00 00 <48> 8b 40 08 e8 96 28 42 00 48 85 c0 0f b7 83 68 05 00 00 74 0a 83
| [ 350.842068] RSP: 0018:ffff9a958cce8858 EFLAGS: 00010246
| [ 350.847324] RAX: 0000000000000000 RBX: ffff897618739c80 RCX: 0000000000000007
| [ 350.854502] RDX: 0000000000000020 RSI: ffff897607afb0b0 RDI: ffff897605c85580
| [ 350.861682] RBP: ffff9a958cce8878 R08: 0000000000000178 R09: ffff89763e407800
| [ 350.868859] R10: 00000000000004c4 R11: ffff9a958cce89c7 R12: ffff897607afb0b0
| [ 350.876039] R13: ffff897605c85580 R14: ffff8976205fbe00 R15: ffff89762688b400
| [ 350.883219] FS: 0000000000000000(0000) GS:ffff89763ec00000(0000) knlGS:0000000000000000
| [ 350.891358] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| [ 350.897138] CR2: 0000000000000008 CR3: 0000001fd7914000 CR4: 0000000000340ee0
| [ 350.904319] Call Trace:
| [ 350.906787] <IRQ>
| [ 350.908824] tcp_v6_syn_recv_sock+0x8d/0x710
| [ 350.913259] ? ip6_route_output_flags_noref+0xd0/0x110
| [ 350.918435] tcp_get_cookie_sock+0x48/0x140
| [ 350.922688] cookie_v6_check+0x5a2/0x700
| [ 350.926714] tcp_v6_do_rcv+0x36c/0x3e0
| [ 350.930589] ? tcp_v6_do_rcv+0x36c/0x3e0
| [ 350.934589] tcp_v6_rcv+0xa16/0xa60
| [ 350.938102] ip6_protocol_deliver_rcu+0xd8/0x4d0
| [ 350.942750] ip6_input+0x41/0xb0
| [ 350.946000] ip6_sublist_rcv_finish+0x42/0x60
| [ 350.950385] ip6_sublist_rcv+0x235/0x260
| [ 350.954333] ? __netif_receive_skb_core+0x19d/0xc60
| [ 350.959245] ipv6_list_rcv+0x110/0x140
| [ 350.963018] __netif_receive_skb_list_core+0x157/0x260
| [ 350.968192] ? build_skb+0x17/0x80
| [ 350.971615] netif_receive_skb_list_internal+0x187/0x2a0
| [ 350.976961] gro_normal_list.part.131+0x1e/0x40
| [ 350.981519] napi_complete_done+0x94/0x120
| [ 350.985700] mlx5e_napi_poll+0x178/0x630 [mlx5_core]
| [ 350.990697] net_rx_action+0x140/0x3e0
| [ 350.994475] __do_softirq+0xe4/0x2da
| [ 350.998079] irq_exit+0xae/0xb0
| [ 351.001239] do_IRQ+0x59/0xe0
| [ 351.004228] common_interrupt+0xf/0xf
| [ 351.007913] </IRQ>
| [ 351.010029] RIP: 0010:cpuidle_enter_state+0xbc/0x440
| [ 351.015023] Code: ff e8 b8 ca 80 ff 80 7d d3 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 54 03 00 00 31 ff e8 4b 4f 87 ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 1a 03 00 00 4c 2b 7d c8 48 ba cf f7 53 e3 a5 9b c4
| [ 351.033952] RSP: 0018:ffff9a958026fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd6
| [ 351.041633] RAX: ffff89763ec2fe00 RBX: ffffffff84b66b40 RCX: 000000000000001f
| [ 351.048816] RDX: 00000051abe96150 RSI: 000000002abf3234 RDI: 0000000000000000
| [ 351.055997] RBP: ffff9a958026fe88 R08: 0000000000000002 R09: 000000000002f680
| [ 351.063176] R10: ffff9a958026fe18 R11: 0000000000000115 R12: ffff8976274c3800
| [ 351.070355] R13: 0000000000000001 R14: ffffffff84b66bb8 R15: 00000051abe96150
| [ 351.077540] ? cpuidle_enter_state+0x98/0x440
| [ 351.081930] ? menu_select+0x377/0x600
| [ 351.085706] cpuidle_enter+0x2e/0x40
| [ 351.089310] call_cpuidle+0x23/0x40
| [ 351.092821] do_idle+0x1f6/0x270
| [ 351.096069] cpu_startup_entry+0x1d/0x20
| [ 351.100024] start_secondary+0x166/0x1c0
| [ 351.103977] secondary_startup_64+0xa4/0xb0
| [ 351.108186] Modules linked in: binfmt_misc bonding nls_iso8859_1 ipmi_ssif edac_mce_amd kvm_amd kvm hpilo ccp ipmi_si ipmi_devintf ipmi_msghandler acpi_tad k10temp mac_hid acpi_power_meter sch_fq tcp_bbr ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear mlx5_ib raid1 ses enclosure ib_uverbs ib_core mgag200 drm_vram_helper ttm drm_kms_helper syscopyarea crct10dif_pclmul sysfillrect mlx5_core crc32_pclmul sysimgblt smartpqi fb_sys_fops uas ghash_clmulni_intel aesni_intel crypto_simd igb pci_hyperv_intf cryptd glue_helper usb_storage dca tls drm i2c_algo_bit scsi_transport_sas mlxfw nvme i2c_piix4 nvme_core wmi
| [ 351.180156] CR2: 0000000000000008
| [ 351.183629] ---[ end trace 23210cdf0c6d5851 ]---
| [ 351.322276] RIP: 0010:tcp_create_openreq_child+0x2e1/0x3e0
| [ 351.327974] Code: 08 00 00 41 8b 84 24 18 01 00 00 48 c7 83 80 08 00 00 00 00 00 00 4c 89 e6 4c 89 ef 89 83 c4 05 00 00 49 8b 84 24 f8 00 00 00 <48> 8b 40 08 e8 96 28 42 00 48 85 c0 0f b7 83 68 05 00 00 74 0a 83
| [ 351.346878] RSP: 0018:ffff9a958cce8858 EFLAGS: 00010246
| [ 351.352166] RAX: 0000000000000000 RBX: ffff897618739c80 RCX: 0000000000000007
| [ 351.359348] RDX: 0000000000000020 RSI: ffff897607afb0b0 RDI: ffff897605c85580
| [ 351.366526] RBP: ffff9a958cce8878 R08: 0000000000000178 R09: ffff89763e407800
| [ 351.373705] R10: 00000000000004c4 R11: ffff9a958cce89c7 R12: ffff897607afb0b0
| [ 351.380886] R13: ffff897605c85580 R14: ffff8976205fbe00 R15: ffff89762688b400
| [ 351.388065] FS: 0000000000000000(0000) GS:ffff89763ec00000(0000) knlGS:0000000000000000
| [ 351.396203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
| [ 351.401982] CR2: 0000000000000008 CR3: 0000001fd7914000 CR4: 0000000000340ee0
| [ 351.409162] Kernel panic - not syncing: Fatal exception in interrupt
| [ 351.415613] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
| [ 351.437793] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Per ~IS-Outage on Mattermost, tried various other older kernels and it seems -121 is working fine so looks to be introduced in -122 (maybe LP:1978719?).