kernel NULL pointer dereference during xen hibernation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-aws (Ubuntu) |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
SRU Justification
[Impact]
Encountered the following panic while doing hibernation/resume testing with linux-aws 5.8 on Focal on an m3.xlarge (xen) instance type:
[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!
[ 594.411609] BUG: kernel NULL pointer dereference, address: 00000000000001f4
[ 594.424658] #PF: supervisor write access in kernel mode
[ 594.424660] #PF: error_code(0x0002) - not-present page
[ 594.424661] PGD 0 P4D 0
[ 594.424665] Oops: 0002 [#1] SMP PTI
[ 594.424668] CPU: 3 PID: 362 Comm: systemd-timesyn Not tainted 5.8.0-1036-aws #38~20.04.1-Ubuntu
[ 594.424669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 594.424675] RIP: 0010:_raw_
[ 594.424678] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 9c 58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 07 4c 89 e0 41 5c 5d c3 89 c6 e8 e9 d1 56 ff 66 90
[ 594.424679] RSP: 0018:ffffc90000
[ 594.424680] RAX: 0000000000000000 RBX: ffff8883bcc0d000 RCX: 0000000000000e02
[ 594.424681] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000000001f4
[ 594.424682] RBP: ffffc900004e3850 R08: ffff8883b90b5ec0 R09: 000000000000005a
[ 594.424683] R10: ffffc900004e3910 R11: 0000000000000000 R12: 0000000000000206
[ 594.424684] R13: ffffea000ee42d40 R14: 0000000000000000 R15: 0000000000000001
[ 594.424686] FS: 00007f65ba05598
[ 594.424687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.424688] CR2: 00000000000001f4 CR3: 00000003b99f0001 CR4: 00000000001606e0
[ 594.424692] Call Trace:
[ 594.424699] xennet_
[ 594.424704] dev_hard_
[ 594.424706] ? validate_
[ 594.424710] sch_direct_
[ 594.424712] __dev_queue_
[ 594.424714] ? neigh_add_
[ 594.424716] dev_queue_
[ 594.424717] neigh_resolve_
[ 594.424721] ip_finish_
[ 594.424723] __ip_finish_
[ 594.424725] ip_finish_
[ 594.424728] ip_output+0x7a/0xf0
[ 594.424730] ? __ip_finish_
[ 594.424732] ip_local_
[ 594.424734] ip_send_
[ 594.424737] udp_send_
[ 594.424739] udp_sendmsg+
[ 594.424742] ? ip_reply_
[ 594.424747] ? delete_
[ 594.424750] ? _cond_resched+
[ 594.424754] ? aa_sk_perm+
[ 594.424757] inet_sendmsg+
[ 594.424759] ? security_
[ 594.424760] ? inet_sendmsg+
[ 594.424764] sock_sendmsg+
[ 594.424766] __sys_sendto+
[ 594.424770] ? __secure_
[ 594.424774] ? syscall_
[ 594.424777] __x64_sys_
[ 594.424781] do_syscall_
[ 594.424783] entry_SYSCALL_
[ 594.424785] RIP: 0033:0x7f65baee4844
[ 594.424788] Code: 42 3f f7 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 30 89 ef 48 89 44 24 08 e8 68 3f f7 ff 48 8b
[ 594.424789] RSP: 002b:00007ffe9b
[ 594.424790] RAX: ffffffffffffffda RBX: 00007ffe9b5fd4e0 RCX: 00007f65baee4844
[ 594.424791] RDX: 0000000000000030 RSI: 00007ffe9b5fd3f0 RDI: 0000000000000010
[ 594.424792] RBP: 0000000000000000 R08: 0000560426541678 R09: 0000000000000010
[ 594.424793] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
[ 594.424794] R13: 00007ffe9b5fd3e4 R14: 0000000000000068 R15: 0000000000000000
[ 594.424796] Modules linked in: btrfs blake2b_generic xor raid6_pq ufs msdos xfs libcrc32c dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd psmouse cryptd input_leds glue_helper serio_raw floppy sch_fq_codel drm ip_tables x_tables autofs4
[ 594.424813] CR2: 00000000000001f4
[ 594.424821] ---[ end trace bb5f35055c1a8060 ]---
[ 594.424822] RIP: 0010:_raw_
[ 594.424824] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 9c 58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 07 4c 89 e0 41 5c 5d c3 89 c6 e8 e9 d1 56 ff 66 90
[ 594.424824] RSP: 0018:ffffc90000
[ 594.424825] RAX: 0000000000000000 RBX: ffff8883bcc0d000 RCX: 0000000000000e02
[ 594.424826] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000000001f4
[ 594.424826] RBP: ffffc900004e3850 R08: ffff8883b90b5ec0 R09: 000000000000005a
[ 594.424827] R10: ffffc900004e3910 R11: 0000000000000000 R12: 0000000000000206
[ 594.424827] R13: ffffea000ee42d40 R14: 0000000000000000 R15: 0000000000000001
[ 594.424828] FS: 00007f65ba05598
[ 594.424829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.424830] CR2: 00000000000001f4 CR3: 00000003b99f0001 CR4: 00000000001606e0
[ 594.424830] Kernel panic - not syncing: Fatal exception in interrupt
[ 594.425091] Kernel Offset: disabled
This log was captured from the AWS system log (which is not a live console as this is a xen instance type). From the user perspective, the instance appears to have just rebooted during the hibernation / resume. It's only by logging the console that we have any information at all.
This is not a frequent event. From 100 test runs on this instance, two failures were observed: this failure and another which left no traces of the issue.
[Test Plan]
Repeatedly hibernate/resume an EC2 instance.
[Where problems could occur]
The network must be started as if from boot. DHCP could fail.
Changed in linux-aws (Ubuntu): | |
status: | Confirmed → Won't Fix |
Logs: spin_lock_ irqsave+ 0x23/0x40 start_xmit+ 0x158/0x570
[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!
[ 594.411609] BUG: kernel NULL pointer dereference, address: 00000000000001f4
[ 594.424658] #PF: supervisor write access in kernel mode
[ 594.424660] #PF: error_code(0x0002) - not-present page
[ 594.424661] PGD 0 P4D 0
[ 594.424665] Oops: 0002 [#1] SMP PTI
[ 594.424668] CPU: 3 PID: 362 Comm: systemd-timesyn Not tainted 5.8.0-1036-aws #38~20.04.1-Ubuntu
[ 594.424669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 594.424675] RIP: 0010:_raw_
[...]
[ 594.424692] Call Trace:
[ 594.424699] xennet_
By looking the assembly, this is where it fails:
ffffffff8182d8e0 <xennet_ start_xmit> : start_xmit+ 0x47a>
[...]
ffffffff8182da0a: 48 83 f8 0d cmp $0xd,%rax
ffffffff8182da0e: 0f 86 46 03 00 00 jbe ffffffff8182dd5a <xennet_
### pahole --hex -C netfront_queue usr/lib/ debug/boot/ vmlinux- 5.8.0-1035- aws |grep lock
###### spinlock_t tx_lock; /* 0x1f4 0x4 */
ffffffff8182da14: 49 8d 86 f4 01 00 00 lea 0x1f4(%r14),%rax # <-- #rax = &queue->tx_lock lock_irqsave> #<-- OOPS here
ffffffff8182da1b: 45 8b 4c 24 70 mov 0x70(%r12),%r9d
ffffffff8182da20: 45 2b 4c 24 74 sub 0x74(%r12),%r9d
ffffffff8182da25: 48 89 c7 mov %rax,%rdi # <-- %rdi = %rax
ffffffff8182da28: 44 89 4d 94 mov %r9d,-0x6c(%rbp)
ffffffff8182da2c: 48 89 85 78 ff ff ff mov %rax,-0x88(%rbp)
ffffffff8182da33: e8 38 52 36 00 callq ffffffff81b92c70 <_raw_spin_
[...]
By correlating with the code, we have this in C:
static netdev_tx_t xennet_ start_xmit( struct sk_buff *skb, struct net_device *dev)
[...]
len = skb_headlen(skb);
spin_lock_ irqsave( &queue- >tx_lock, flags); // <<< HERE
if (unlikely( !netif_ carrier_ ok(dev) ||
[...]
Happens that queue->tx_lock is NULL.
What is interesting is the message:
[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!
So, it means the hibernation woke in a different compute node than it went sleeping. I'm still not 100% sure of why that would cause such OOPS...but I have 2 ideas to either prevent it and validate that hypothesis:
(a) To modprobe unload the xen network driver right before hibernation and load it in the last stage of wake-up - I've done that myself in the hibernation scripts. This would likely prevent this issue.
(b) For testing personnel : maybe you're able to "lock" the testing to sleep/wake-up *always* in the same compute node/instance. If that prevents the issue, we're sure that the difference between the nodes rom sleep/wake-up is triggering some memory corruption in the device queue, that is somewhat propagated to the kernel memory.