kernel NULL pointer dereference during xen hibernation

Bug #1934424 reported by Francis Ginther
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-aws (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

SRU Justification

[Impact]

Encountered the following panic while doing hibernation/resume testing with linux-aws 5.8 on Focal on an m3.xlarge (xen) instance type:

[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!
[ 594.411609] BUG: kernel NULL pointer dereference, address: 00000000000001f4
[ 594.424658] #PF: supervisor write access in kernel mode
[ 594.424660] #PF: error_code(0x0002) - not-present page
[ 594.424661] PGD 0 P4D 0
[ 594.424665] Oops: 0002 [#1] SMP PTI
[ 594.424668] CPU: 3 PID: 362 Comm: systemd-timesyn Not tainted 5.8.0-1036-aws #38~20.04.1-Ubuntu
[ 594.424669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 594.424675] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
[ 594.424678] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 9c 58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 07 4c 89 e0 41 5c 5d c3 89 c6 e8 e9 d1 56 ff 66 90
[ 594.424679] RSP: 0018:ffffc900004e3848 EFLAGS: 00010046
[ 594.424680] RAX: 0000000000000000 RBX: ffff8883bcc0d000 RCX: 0000000000000e02
[ 594.424681] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000000001f4
[ 594.424682] RBP: ffffc900004e3850 R08: ffff8883b90b5ec0 R09: 000000000000005a
[ 594.424683] R10: ffffc900004e3910 R11: 0000000000000000 R12: 0000000000000206
[ 594.424684] R13: ffffea000ee42d40 R14: 0000000000000000 R15: 0000000000000001
[ 594.424686] FS: 00007f65ba055980(0000) GS:ffff8883c0ac0000(0000) knlGS:0000000000000000
[ 594.424687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.424688] CR2: 00000000000001f4 CR3: 00000003b99f0001 CR4: 00000000001606e0
[ 594.424692] Call Trace:
[ 594.424699] xennet_start_xmit+0x158/0x570
[ 594.424704] dev_hard_start_xmit+0x91/0x1f0
[ 594.424706] ? validate_xmit_skb+0x300/0x340
[ 594.424710] sch_direct_xmit+0x113/0x340
[ 594.424712] __dev_queue_xmit+0x57c/0x8e0
[ 594.424714] ? neigh_add_timer+0x37/0x60
[ 594.424716] dev_queue_xmit+0x10/0x20
[ 594.424717] neigh_resolve_output+0x112/0x1c0
[ 594.424721] ip_finish_output2+0x19b/0x590
[ 594.424723] __ip_finish_output+0xc8/0x1e0
[ 594.424725] ip_finish_output+0x2d/0xb0
[ 594.424728] ip_output+0x7a/0xf0
[ 594.424730] ? __ip_finish_output+0x1e0/0x1e0
[ 594.424732] ip_local_out+0x3d/0x50
[ 594.424734] ip_send_skb+0x19/0x40
[ 594.424737] udp_send_skb.isra.0+0x165/0x390
[ 594.424739] udp_sendmsg+0xb0e/0xd50
[ 594.424742] ? ip_reply_glue_bits+0x50/0x50
[ 594.424747] ? delete_from_swap_cache+0x6a/0x90
[ 594.424750] ? _cond_resched+0x19/0x30
[ 594.424754] ? aa_sk_perm+0x43/0x1b0
[ 594.424757] inet_sendmsg+0x65/0x70
[ 594.424759] ? security_socket_sendmsg+0x35/0x50
[ 594.424760] ? inet_sendmsg+0x65/0x70
[ 594.424764] sock_sendmsg+0x5e/0x70
[ 594.424766] __sys_sendto+0x113/0x190
[ 594.424770] ? __secure_computing+0x42/0xe0
[ 594.424774] ? syscall_trace_enter+0x10d/0x280
[ 594.424777] __x64_sys_sendto+0x29/0x30
[ 594.424781] do_syscall_64+0x49/0xc0
[ 594.424783] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 594.424785] RIP: 0033:0x7f65baee4844
[ 594.424788] Code: 42 3f f7 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 30 89 ef 48 89 44 24 08 e8 68 3f f7 ff 48 8b
[ 594.424789] RSP: 002b:00007ffe9b5fd3a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[ 594.424790] RAX: ffffffffffffffda RBX: 00007ffe9b5fd4e0 RCX: 00007f65baee4844
[ 594.424791] RDX: 0000000000000030 RSI: 00007ffe9b5fd3f0 RDI: 0000000000000010
[ 594.424792] RBP: 0000000000000000 R08: 0000560426541678 R09: 0000000000000010
[ 594.424793] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
[ 594.424794] R13: 00007ffe9b5fd3e4 R14: 0000000000000068 R15: 0000000000000000
[ 594.424796] Modules linked in: btrfs blake2b_generic xor raid6_pq ufs msdos xfs libcrc32c dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd psmouse cryptd input_leds glue_helper serio_raw floppy sch_fq_codel drm ip_tables x_tables autofs4
[ 594.424813] CR2: 00000000000001f4
[ 594.424821] ---[ end trace bb5f35055c1a8060 ]---
[ 594.424822] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
[ 594.424824] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 9c 58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 07 4c 89 e0 41 5c 5d c3 89 c6 e8 e9 d1 56 ff 66 90
[ 594.424824] RSP: 0018:ffffc900004e3848 EFLAGS: 00010046
[ 594.424825] RAX: 0000000000000000 RBX: ffff8883bcc0d000 RCX: 0000000000000e02
[ 594.424826] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000000001f4
[ 594.424826] RBP: ffffc900004e3850 R08: ffff8883b90b5ec0 R09: 000000000000005a
[ 594.424827] R10: ffffc900004e3910 R11: 0000000000000000 R12: 0000000000000206
[ 594.424827] R13: ffffea000ee42d40 R14: 0000000000000000 R15: 0000000000000001
[ 594.424828] FS: 00007f65ba055980(0000) GS:ffff8883c0ac0000(0000) knlGS:0000000000000000
[ 594.424829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.424830] CR2: 00000000000001f4 CR3: 00000003b99f0001 CR4: 00000000001606e0
[ 594.424830] Kernel panic - not syncing: Fatal exception in interrupt
[ 594.425091] Kernel Offset: disabled

This log was captured from the AWS system log (which is not a live console as this is a xen instance type). From the user perspective, the instance appears to have just rebooted during the hibernation / resume. It's only by logging the console that we have any information at all.

This is not a frequent event. From 100 test runs on this instance, two failures were observed: this failure and another which left no traces of the issue.

[Test Plan]

Repeatedly hibernate/resume an EC2 instance.

[Where problems could occur]

The network must be started as if from boot. DHCP could fail.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Logs:
[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!
[ 594.411609] BUG: kernel NULL pointer dereference, address: 00000000000001f4
[ 594.424658] #PF: supervisor write access in kernel mode
[ 594.424660] #PF: error_code(0x0002) - not-present page
[ 594.424661] PGD 0 P4D 0
[ 594.424665] Oops: 0002 [#1] SMP PTI
[ 594.424668] CPU: 3 PID: 362 Comm: systemd-timesyn Not tainted 5.8.0-1036-aws #38~20.04.1-Ubuntu
[ 594.424669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 594.424675] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
[...]
[ 594.424692] Call Trace:
[ 594.424699] xennet_start_xmit+0x158/0x570

By looking the assembly, this is where it fails:

ffffffff8182d8e0 <xennet_start_xmit>:
[...]
ffffffff8182da0a: 48 83 f8 0d cmp $0xd,%rax
ffffffff8182da0e: 0f 86 46 03 00 00 jbe ffffffff8182dd5a <xennet_start_xmit+0x47a>

### pahole --hex -C netfront_queue usr/lib/debug/boot/vmlinux-5.8.0-1035-aws |grep lock
###### spinlock_t tx_lock; /* 0x1f4 0x4 */

ffffffff8182da14: 49 8d 86 f4 01 00 00 lea 0x1f4(%r14),%rax # <-- #rax = &queue->tx_lock
ffffffff8182da1b: 45 8b 4c 24 70 mov 0x70(%r12),%r9d
ffffffff8182da20: 45 2b 4c 24 74 sub 0x74(%r12),%r9d
ffffffff8182da25: 48 89 c7 mov %rax,%rdi # <-- %rdi = %rax
ffffffff8182da28: 44 89 4d 94 mov %r9d,-0x6c(%rbp)
ffffffff8182da2c: 48 89 85 78 ff ff ff mov %rax,-0x88(%rbp)
ffffffff8182da33: e8 38 52 36 00 callq ffffffff81b92c70 <_raw_spin_lock_irqsave> #<-- OOPS here
[...]

By correlating with the code, we have this in C:

static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
[...]
len = skb_headlen(skb);

spin_lock_irqsave(&queue->tx_lock, flags); // <<< HERE

if (unlikely(!netif_carrier_ok(dev) ||
[...]

Happens that queue->tx_lock is NULL.
What is interesting is the message:

[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!

So, it means the hibernation woke in a different compute node than it went sleeping. I'm still not 100% sure of why that would cause such OOPS...but I have 2 ideas to either prevent it and validate that hypothesis:

(a) To modprobe unload the xen network driver right before hibernation and load it in the last stage of wake-up - I've done that myself in the hibernation scripts. This would likely prevent this issue.

(b) For testing personnel : maybe you're able to "lock" the testing to sleep/wake-up *always* in the same compute node/instance. If that prevents the issue, we're sure that the difference between the nodes rom sleep/wake-up is triggering some memory corruption in the device queue, that is somewhat propagated to the kernel memory.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws (Ubuntu):
status: New → Confirmed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

A workaround is to completely 'down' the network before hibernation. The patches as suggested by Francis Ginther had only 4 failures in 2500 hibernation cycles.

description: updated
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

We already ship lib/systemd/system-sleep/hibinit-agent that does post things in hibinit-resume to workaround things. Including stuff about networking, due to previous issues with xen-netfront.

Given we already restart systemd-networkd, I wonder if we should do better and not keep xen-netfront loaded in the kernel across suspend and resume.

as in
1) extend hibinit-agent.system-sleep to also have pre command
2) in the pre step unload xen-netfront
3) in the post step only load xen-netfront
4) potentially still stop systemd-networkd in pre; which will be restarted after loading xen-netfront in post

Since xen-netfront is causing us issues; and we are already unloading and reloading it; it makes sense to unload it before suspending altogether.

Tim Gardner (timg-tpi)
Changed in linux-aws (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.