Ubuntu
linux-aws package

kernel NULL pointer dereference during xen hibernation

Bug #1934424 reported by Francis Ginther on 2021-07-02

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	linux-aws (Ubuntu)	Won't Fix	Undecided	Unassigned

Bug Description

SRU Justification

[Impact]

Encountered the following panic while doing hibernation/resume testing with linux-aws 5.8 on Focal on an m3.xlarge (xen) instance type:

[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!
[ 594.411609] BUG: kernel NULL pointer dereference, address: 00000000000001f4
[ 594.424658] #PF: supervisor write access in kernel mode
[ 594.424660] #PF: error_code(0x0002) - not-present page
[ 594.424661] PGD 0 P4D 0
[ 594.424665] Oops: 0002 [#1] SMP PTI
[ 594.424668] CPU: 3 PID: 362 Comm: systemd-timesyn Not tainted 5.8.0-1036-aws #38~20.04.1-Ubuntu
[ 594.424669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 594.424675] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
[ 594.424678] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 9c 58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 07 4c 89 e0 41 5c 5d c3 89 c6 e8 e9 d1 56 ff 66 90
[ 594.424679] RSP: 0018:ffffc900004e3848 EFLAGS: 00010046
[ 594.424680] RAX: 0000000000000000 RBX: ffff8883bcc0d000 RCX: 0000000000000e02
[ 594.424681] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000000001f4
[ 594.424682] RBP: ffffc900004e3850 R08: ffff8883b90b5ec0 R09: 000000000000005a
[ 594.424683] R10: ffffc900004e3910 R11: 0000000000000000 R12: 0000000000000206
[ 594.424684] R13: ffffea000ee42d40 R14: 0000000000000000 R15: 0000000000000001
[ 594.424686] FS: 00007f65ba055980(0000) GS:ffff8883c0ac0000(0000) knlGS:0000000000000000
[ 594.424687] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.424688] CR2: 00000000000001f4 CR3: 00000003b99f0001 CR4: 00000000001606e0
[ 594.424692] Call Trace:
[ 594.424699] xennet_start_xmit+0x158/0x570
[ 594.424704] dev_hard_start_xmit+0x91/0x1f0
[ 594.424706] ? validate_xmit_skb+0x300/0x340
[ 594.424710] sch_direct_xmit+0x113/0x340
[ 594.424712] __dev_queue_xmit+0x57c/0x8e0
[ 594.424714] ? neigh_add_timer+0x37/0x60
[ 594.424716] dev_queue_xmit+0x10/0x20
[ 594.424717] neigh_resolve_output+0x112/0x1c0
[ 594.424721] ip_finish_output2+0x19b/0x590
[ 594.424723] __ip_finish_output+0xc8/0x1e0
[ 594.424725] ip_finish_output+0x2d/0xb0
[ 594.424728] ip_output+0x7a/0xf0
[ 594.424730] ? __ip_finish_output+0x1e0/0x1e0
[ 594.424732] ip_local_out+0x3d/0x50
[ 594.424734] ip_send_skb+0x19/0x40
[ 594.424737] udp_send_skb.isra.0+0x165/0x390
[ 594.424739] udp_sendmsg+0xb0e/0xd50
[ 594.424742] ? ip_reply_glue_bits+0x50/0x50
[ 594.424747] ? delete_from_swap_cache+0x6a/0x90
[ 594.424750] ? _cond_resched+0x19/0x30
[ 594.424754] ? aa_sk_perm+0x43/0x1b0
[ 594.424757] inet_sendmsg+0x65/0x70
[ 594.424759] ? security_socket_sendmsg+0x35/0x50
[ 594.424760] ? inet_sendmsg+0x65/0x70
[ 594.424764] sock_sendmsg+0x5e/0x70
[ 594.424766] __sys_sendto+0x113/0x190
[ 594.424770] ? __secure_computing+0x42/0xe0
[ 594.424774] ? syscall_trace_enter+0x10d/0x280
[ 594.424777] __x64_sys_sendto+0x29/0x30
[ 594.424781] do_syscall_64+0x49/0xc0
[ 594.424783] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 594.424785] RIP: 0033:0x7f65baee4844
[ 594.424788] Code: 42 3f f7 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 30 89 ef 48 89 44 24 08 e8 68 3f f7 ff 48 8b
[ 594.424789] RSP: 002b:00007ffe9b5fd3a0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[ 594.424790] RAX: ffffffffffffffda RBX: 00007ffe9b5fd4e0 RCX: 00007f65baee4844
[ 594.424791] RDX: 0000000000000030 RSI: 00007ffe9b5fd3f0 RDI: 0000000000000010
[ 594.424792] RBP: 0000000000000000 R08: 0000560426541678 R09: 0000000000000010
[ 594.424793] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
[ 594.424794] R13: 00007ffe9b5fd3e4 R14: 0000000000000068 R15: 0000000000000000
[ 594.424796] Modules linked in: btrfs blake2b_generic xor raid6_pq ufs msdos xfs libcrc32c dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd psmouse cryptd input_leds glue_helper serio_raw floppy sch_fq_codel drm ip_tables x_tables autofs4
[ 594.424813] CR2: 00000000000001f4
[ 594.424821] ---[ end trace bb5f35055c1a8060 ]---
[ 594.424822] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
[ 594.424824] Code: 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 9c 58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 07 4c 89 e0 41 5c 5d c3 89 c6 e8 e9 d1 56 ff 66 90
[ 594.424824] RSP: 0018:ffffc900004e3848 EFLAGS: 00010046
[ 594.424825] RAX: 0000000000000000 RBX: ffff8883bcc0d000 RCX: 0000000000000e02
[ 594.424826] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000000001f4
[ 594.424826] RBP: ffffc900004e3850 R08: ffff8883b90b5ec0 R09: 000000000000005a
[ 594.424827] R10: ffffc900004e3910 R11: 0000000000000000 R12: 0000000000000206
[ 594.424827] R13: ffffea000ee42d40 R14: 0000000000000000 R15: 0000000000000001
[ 594.424828] FS: 00007f65ba055980(0000) GS:ffff8883c0ac0000(0000) knlGS:0000000000000000
[ 594.424829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 594.424830] CR2: 00000000000001f4 CR3: 00000003b99f0001 CR4: 00000000001606e0
[ 594.424830] Kernel panic - not syncing: Fatal exception in interrupt
[ 594.425091] Kernel Offset: disabled

This log was captured from the AWS system log (which is not a live console as this is a xen instance type). From the user perspective, the instance appears to have just rebooted during the hibernation / resume. It's only by logging the console that we have any information at all.

This is not a frequent event. From 100 test runs on this instance, two failures were observed: this failure and another which left no traces of the issue.

[Test Plan]

Repeatedly hibernate/resume an EC2 instance.

[Where problems could occur]

The network must be started as if from boot. DHCP could fail.

See original description

Revision history for this message

Guilherme G. Piccoli (gpiccoli) wrote on 2021-07-20:

Logs:
[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!
[ 594.411609] BUG: kernel NULL pointer dereference, address: 00000000000001f4
[ 594.424658] #PF: supervisor write access in kernel mode
[ 594.424660] #PF: error_code(0x0002) - not-present page
[ 594.424661] PGD 0 P4D 0
[ 594.424665] Oops: 0002 [#1] SMP PTI
[ 594.424668] CPU: 3 PID: 362 Comm: systemd-timesyn Not tainted 5.8.0-1036-aws #38~20.04.1-Ubuntu
[ 594.424669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[ 594.424675] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
[...]
[ 594.424692] Call Trace:
[ 594.424699] xennet_start_xmit+0x158/0x570

By looking the assembly, this is where it fails:

ffffffff8182d8e0 <xennet_start_xmit>:
[...]
ffffffff8182da0a: 48 83 f8 0d cmp $0xd,%rax
ffffffff8182da0e: 0f 86 46 03 00 00 jbe ffffffff8182dd5a <xennet_start_xmit+0x47a>

### pahole --hex -C netfront_queue usr/lib/debug/boot/vmlinux-5.8.0-1035-aws |grep lock
###### spinlock_t tx_lock; /* 0x1f4 0x4 */

ffffffff8182da14: 49 8d 86 f4 01 00 00 lea 0x1f4(%r14),%rax # <-- #rax = &queue->tx_lock
ffffffff8182da1b: 45 8b 4c 24 70 mov 0x70(%r12),%r9d
ffffffff8182da20: 45 2b 4c 24 74 sub 0x74(%r12),%r9d
ffffffff8182da25: 48 89 c7 mov %rax,%rdi # <-- %rdi = %rax
ffffffff8182da28: 44 89 4d 94 mov %r9d,-0x6c(%rbp)
ffffffff8182da2c: 48 89 85 78 ff ff ff mov %rax,-0x88(%rbp)
ffffffff8182da33: e8 38 52 36 00 callq ffffffff81b92c70 <_raw_spin_lock_irqsave> #<-- OOPS here
[...]

By correlating with the code, we have this in C:

static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
[...]
len = skb_headlen(skb);

spin_lock_irqsave(&queue->tx_lock, flags); // <<< HERE

if (unlikely(!netif_carrier_ok(dev) ||
[...]

Happens that queue->tx_lock is NULL.
What is interesting is the message:

[ 594.291317] ACPI: Hardware changed while hibernated, success doubtful!

So, it means the hibernation woke in a different compute node than it went sleeping. I'm still not 100% sure of why that would cause such OOPS...but I have 2 ideas to either prevent it and validate that hypothesis:

(a) To modprobe unload the xen network driver right before hibernation and load it in the last stage of wake-up - I've done that myself in the hibernation scripts. This would likely prevent this issue.

(b) For testing personnel : maybe you're able to "lock" the testing to sleep/wake-up *always* in the same compute node/instance. If that prevents the issue, we're sure that the difference between the nodes rom sleep/wake-up is triggering some memory corruption in the device queue, that is somewhat propagated to the kernel memory.

Logs:
[  594.291317] ACPI: Hardware changed while hibernated, success doubtful!
[  594.411609] BUG: kernel NULL pointer dereference, address: 00000000000001f4
[  594.424658] #PF: supervisor write access in kernel mode
[  594.424660] #PF: error_code(0x0002) - not-present page
[  594.424661] PGD 0 P4D 0 
[  594.424665] Oops: 0002 [#1] SMP PTI
[  594.424668] CPU: 3 PID: 362 Comm: systemd-timesyn Not tainted 5.8.0-1036-aws #38~20.04.1-Ubuntu
[  594.424669] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
[  594.424675] RIP: 0010:_raw_spin_lock_irqsave+0x23/0x40
[...]
[  594.424692] Call Trace:
[  594.424699]  xennet_start_xmit+0x158/0x570

By looking the assembly, this is where it fails:

ffffffff8182d8e0 <xennet_start_xmit>:
[...]
ffffffff8182da0a:       48 83 f8 0d             cmp    $0xd,%rax
ffffffff8182da0e:       0f 86 46 03 00 00       jbe    ffffffff8182dd5a <xennet_start_xmit+0x47a>

### pahole --hex -C netfront_queue  usr/lib/debug/boot/vmlinux-5.8.0-1035-aws |grep lock
###### spinlock_t                 tx_lock;              /* 0x1f4   0x4 */

ffffffff8182da14:       49 8d 86 f4 01 00 00    lea    0x1f4(%r14),%rax # <-- #rax = &queue->tx_lock
ffffffff8182da1b:       45 8b 4c 24 70          mov    0x70(%r12),%r9d
ffffffff8182da20:       45 2b 4c 24 74          sub    0x74(%r12),%r9d
ffffffff8182da25:       48 89 c7                mov    %rax,%rdi # <-- %rdi = %rax
ffffffff8182da28:       44 89 4d 94             mov    %r9d,-0x6c(%rbp)
ffffffff8182da2c:       48 89 85 78 ff ff ff    mov    %rax,-0x88(%rbp)
ffffffff8182da33:       e8 38 52 36 00          callq  ffffffff81b92c70 <_raw_spin_lock_irqsave> #<-- OOPS here
[...]

By correlating with the code, we have this in C:

static netdev_tx_t xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
[...]
len = skb_headlen(skb);

spin_lock_irqsave(&queue->tx_lock, flags); // <<< HERE

if (unlikely(!netif_carrier_ok(dev) ||
[...]

Happens that queue->tx_lock is NULL.
What is interesting is the message:

[  594.291317] ACPI: Hardware changed while hibernated, success doubtful!

Revision history for this message

Launchpad Janitor (janitor) wrote on 2021-08-31:

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-aws (Ubuntu):
status:	New → Confirmed

Revision history for this message

Tim Gardner (timg-tpi) wrote on 2021-08-31:

A workaround is to completely 'down' the network before hibernation. The patches as suggested by Francis Ginther had only 4 failures in 2500 hibernation cycles.

description:

updated

Revision history for this message

Dimitri John Ledkov (xnox) wrote on 2021-11-05:

We already ship lib/systemd/system-sleep/hibinit-agent that does post things in hibinit-resume to workaround things. Including stuff about networking, due to previous issues with xen-netfront.

Given we already restart systemd-networkd, I wonder if we should do better and not keep xen-netfront loaded in the kernel across suspend and resume.

as in
1) extend hibinit-agent.system-sleep to also have pre command
2) in the pre step unload xen-netfront
3) in the post step only load xen-netfront
4) potentially still stop systemd-networkd in pre; which will be restarted after loading xen-netfront in post

Since xen-netfront is causing us issues; and we are already unloading and reloading it; it makes sense to unload it before suspending altogether.

Tim Gardner (timg-tpi) on 2022-07-27

Changed in linux-aws (Ubuntu):
status:	Confirmed → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1942223

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux-aws package

kernel NULL pointer dereference during xen hibernation

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Ubuntu
linux-aws package