[x86_64] Kernel panic when building on NFS in VM

Bug #1745608 reported by Jonas Hahnfeld
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Since around begin of December (definitely before any KPTI was merged), I'm having problems building AOSP on an NFS share that is mounted in a VM with Ubuntu 16.04.3. After some minutes the system freezes completely, the SSH session becomes unresponsive and I can't open a new one. Logging in via virt-manager's UI isn't possible either.

When connecting to serial port, I got the following kernel panic stack trace:
[ 2531.058622] PANIC: double fault, error_code: 0x0
[ 2531.073231] Kernel panic - not syncing: Machine halted.
[ 2531.074309] CPU: 2 PID: 16421 Comm: kworker/2:0H Not tainted 4.4.0-112-generic #135-Ubuntu
[ 2531.076122] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 2531.080374] Workqueue: rpciod rpc_async_schedule [sunrpc]
[ 2531.081818] 0000000000000086 3c797ca978191564 ffff88013fd07e80 ffffffff813fc233
[ 2531.083022] ffffffff81cba603 ffff88013fd07f18 ffff88013fd07f08 ffffffff8118f1c7
[ 2531.084382] 0000000000000008 ffff88013fd07f18 ffff88013fd07eb0 3c797ca978191564
[ 2531.086127] Call Trace:
[ 2531.086928] <#DF> [<ffffffff813fc233>] dump_stack+0x63/0x90
[ 2531.088606] [<ffffffff8118f1c7>] panic+0xd3/0x215
[ 2531.091309] [<ffffffff81060d5d>] df_debug+0x2d/0x30
[ 2531.092424] [<ffffffff8102fb8c>] do_double_fault+0x7c/0xf0
[ 2531.093294] [<ffffffff81849288>] double_fault+0x28/0x30
[ 2531.094170] [<ffffffff8172a93d>] ? __build_skb+0x9d/0xe0
[ 2531.095017] <<EOE>> <IRQ> [<ffffffff8172aad0>] __netdev_alloc_skb+0xc0/0x100
[ 2531.096281] [<ffffffff8160a225>] page_to_skb+0x65/0x340
[ 2531.097120] [<ffffffff8160c6c5>] virtnet_receive+0x2a5/0x8f0
[ 2531.098055] [<ffffffff8160c9f3>] ? virtnet_receive+0x5d3/0x8f0
[ 2531.099023] [<ffffffff8160cd2d>] virtnet_poll+0x1d/0x80
[ 2531.100100] [<ffffffff8173a14e>] net_rx_action+0x21e/0x360
[ 2531.100894] [<ffffffff8160a103>] ? skb_recv_done+0x43/0x50
[ 2531.101765] [<ffffffff81086a61>] __do_softirq+0x101/0x290
[ 2531.102619] [<ffffffff81086d63>] irq_exit+0xa3/0xb0
[ 2531.103511] [<ffffffff8184aae4>] do_IRQ+0x54/0xd0
[ 2531.104453] [<ffffffff81846426>] common_interrupt+0x1a6/0x1a6
[ 2531.105459] <EOI> [<ffffffff811ee43f>] ? new_slab+0x29f/0x490
[ 2531.106516] [<ffffffff811ee525>] ? new_slab+0x385/0x490
[ 2531.107428] [<ffffffff811ef47b>] ___slab_alloc+0x22b/0x470
[ 2531.108372] [<ffffffffc0327a74>] ? rpc_malloc+0x34/0xa0 [sunrpc]
[ 2531.109389] [<ffffffffc0323721>] ? xs_sendpages+0x61/0x1d0 [sunrpc]
[ 2531.110693] [<ffffffffc032c34e>] ? rpcauth_lookup_credcache+0xbe/0x2a0 [sunrpc]
[ 2531.112050] [<ffffffffc0327a74>] ? rpc_malloc+0x34/0xa0 [sunrpc]
[ 2531.113174] [<ffffffff811ef6e0>] __slab_alloc+0x20/0x40
[ 2531.113906] [<ffffffff811f1075>] __kmalloc+0x1d5/0x250
[ 2531.114649] [<ffffffffc0327a74>] rpc_malloc+0x34/0xa0 [sunrpc]
[ 2531.115483] [<ffffffffc031cf92>] call_allocate+0xc2/0x1b0 [sunrpc]
[ 2531.116353] [<ffffffffc031ced0>] ? call_reserveresult+0x120/0x120 [sunrpc]
[ 2531.117298] [<ffffffffc031ced0>] ? call_reserveresult+0x120/0x120 [sunrpc]
[ 2531.118236] [<ffffffffc0328551>] __rpc_execute+0x91/0x470 [sunrpc]
[ 2531.119085] [<ffffffffc0328945>] rpc_async_schedule+0x15/0x20 [sunrpc]
[ 2531.120327] [<ffffffff8109b815>] process_one_work+0x165/0x480
[ 2531.121364] [<ffffffff8109bb7b>] worker_thread+0x4b/0x4d0
[ 2531.122340] [<ffffffff8109bb30>] ? process_one_work+0x480/0x480
[ 2531.123696] [<ffffffff810a1eb5>] kthread+0xe5/0x100
[ 2531.124717] [<ffffffff810a1dd0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2531.126057] [<ffffffff81845b9f>] ret_from_fork+0x3f/0x70
[ 2531.127192] [<ffffffff810a1dd0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 2531.129000] Kernel Offset: disabled
[ 2531.160053] ---[ end Kernel panic - not syncing: Machine halted.

I decided to try the HWE kernel - with the same result, but a different stack trace:
[ 3643.637138] PANIC: double fault, error_code: 0x0
[ 3643.650655] Kernel panic - not syncing: Machine halted.
[ 3643.651474] CPU: 0 PID: 47 Comm: kswapd0 Not tainted 4.13.0-32-generic #35~16.04.1-Ubuntu
[ 3643.653108] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 3643.654974] Call Trace:
[ 3643.655384] <#DF>
[ 3643.655740] dump_stack+0x63/0x8b
[ 3643.656337] panic+0xe4/0x23d
[ 3643.656870] df_debug+0x2d/0x30
[ 3643.657421] do_double_fault+0x9a/0x130
[ 3643.658119] double_fault+0x22/0x30
[ 3643.658722] RIP: 0010:_raw_spin_trylock+0x9/0x30
[ 3643.659518] RSP: 0000:ffffa3ec408a3bf8 EFLAGS: 00010292
[ 3643.660404] RAX: 000000000040008c RBX: ffff8c31683d1f00 RCX: ffffa3ec408a3c50
[ 3643.661664] RDX: 0000000000000001 RSI: ffff8c31683d19b0 RDI: ffff8c3166d1e720
[ 3643.662927] RBP: ffffa3ec408a3bf8 R08: ffffa3ec408a3c50 R09: ffffa3ec408a3d50
[ 3643.664166] R10: 000000000004dde0 R11: 0000000000000040 R12: ffff8c3166cb7f58
[ 3643.665359] R13: ffffa3ec408a3c50 R14: ffff8c3166cb7f00 R15: ffff8c3166cb7f80
[ 3643.666573] </#DF>
[ 3643.666948] shrink_dentry_list+0x101/0x2e0
[ 3643.667679] prune_dcache_sb+0x5a/0x80
[ 3643.668330] super_cache_scan+0x119/0x1a0
[ 3643.669025] shrink_slab.part.48+0x1fa/0x420
[ 3643.669760] shrink_slab+0x29/0x30
[ 3643.670357] shrink_node+0x108/0x310
[ 3643.670995] kswapd+0x32a/0x770
[ 3643.671547] kthread+0x109/0x140
[ 3643.672115] ? mem_cgroup_shrink_node+0x180/0x180
[ 3643.672924] ? kthread_create_on_node+0x70/0x70
[ 3643.673707] ret_from_fork+0x1f/0x30
[ 3643.674451] Kernel Offset: 0x9c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3643.676229] ---[ end Kernel panic - not syncing: Machine halted.

Please let me know if you need further information, I can reproduce this pretty reliably...

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1745608

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Jonas Hahnfeld (hahnjo) wrote :

It's a kernel panic rendering the system unusable, so I can't run any diagnostic commands.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.15 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Jonas Hahnfeld (hahnjo) wrote :
Download full text (3.1 KiB)

Got something similar:
[ 2570.763287] PANIC: double fault, error_code: 0x0
[ 2570.764564] Kernel panic - not syncing: Machine halted.
[ 2570.765428] CPU: 0 PID: 22272 Comm: as Not tainted 4.15.0-041500-generic #201801282230
[ 2570.767059] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 2570.768815] Call Trace:
[ 2570.769263] <#DF>
[ 2570.769643] dump_stack+0x63/0x8b
[ 2570.770251] panic+0xe4/0x234
[ 2570.770787] df_debug+0x2d/0x30
[ 2570.771342] do_double_fault+0xa1/0x130
[ 2570.772031] double_fault+0x22/0x30
[ 2570.772583] RIP: 0010:clear_page_erms+0x7/0x10
[ 2570.773303] RSP: 0018:ffffb98e428d78a0 EFLAGS: 00010246
[ 2570.774242] RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000001000
[ 2570.775515] RDX: ffff90327df4c500 RSI: ffffdb5084a10fc0 RDI: ffff90332843f000
[ 2570.776837] RBP: ffffb98e428d79b8 R08: 0000000000000000 R09: ffffdb5084a11000
[ 2570.778144] R10: 0000000000000000 R11: ffffb98e428d7cc0 R12: ffff90333ffd2d00
[ 2570.779437] R13: ffffb98e428d79c8 R14: 00000000014280ca R15: ffffdb5084a10fc0
[ 2570.780712] </#DF>
[ 2570.781098] ? get_page_from_freelist+0xf0d/0x1400
[ 2570.781958] ? try_to_wake_up+0x59/0x480
[ 2570.782709] __alloc_pages_nodemask+0xfb/0x280
[ 2570.783652] alloc_pages_vma+0x88/0x1f0
[ 2570.784354] __handle_mm_fault+0x938/0x11e0
[ 2570.785128] handle_mm_fault+0xb1/0x200
[ 2570.785814] __do_page_fault+0x257/0x4d0
[ 2570.786528] do_page_fault+0x2e/0xd0
[ 2570.787171] do_async_page_fault+0x51/0x80
[ 2570.787902] async_page_fault+0x2c/0x60
[ 2570.788589] RIP: 0010:copy_user_enhanced_fast_string+0xe/0x20
[ 2570.789624] RSP: 0018:ffffb98e428d7cd8 EFLAGS: 00050206
[ 2570.790584] RAX: 00007f5c60aff010 RBX: 0000000004a26f00 RCX: 0000000000000010
[ 2570.791858] RDX: 0000000000001000 RSI: ffff9033289bcff0 RDI: 00007f5c60aff000
[ 2570.793076] RBP: ffffb98e428d7ce0 R08: ffffb98e428d7e50 R09: ffffdb5084a26f00
[ 2570.794161] R10: 0000000000000040 R11: ffffb98e428d7cc0 R12: 0000000000001000
[ 2570.795252] R13: ffffb98e428d7e88 R14: ffff9033289bc000 R15: 0000000000001000
[ 2570.796339] ? copyout+0x26/0x30
[ 2570.796867] copy_page_to_iter+0x10c/0x2f0
[ 2570.797521] generic_file_read_iter+0x44b/0xbe0
[ 2570.798239] ? page_cache_tree_insert+0xe0/0xe0
[ 2570.798969] nfs_file_read+0x6e/0xc0 [nfs]
[ 2570.799667] __vfs_read+0xee/0x160
[ 2570.800262] vfs_read+0x8e/0x130
[ 2570.800817] SyS_read+0x55/0xc0
[ 2570.801358] entry_SYSCALL_64_fastpath+0x24/0x87
[ 2570.802156] RIP: 0033:0x7f5c614b4260
[ 2570.802768] RSP: 002b:00007ffc41b1b638 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 2570.804043] RAX: ffffffffffffffda RBX: 00007f5c61781b20 RCX: 00007f5c614b4260
[ 2570.805242] RDX: 0000000000100000 RSI: 00007f5c60ada010 RDI: 0000000000000003
[ 2570.806445] RBP: 0000000000100010 R08: 00000000016bf380 R09: 0000000000000054
[ 2570.807645] R10: 00007f5c61db8700 R11: 0000000000000246 R12: 0000000000101000
[ 2570.808845] R13: 00007f5c61781b78 R14: 0000000000001000 R15: 0000000000000040
[ 2570.810251] Kernel Offset: 0x2d400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2570.812045] ---[ end Kernel panic - not syncing: Machine...

Read more...

tags: added: kernel-bug-exists-upstream
Revision history for this message
Jonas Hahnfeld (hahnjo) wrote :

Trying linux-image-4.4.0-98-generic next which should be the last known-good kernel according to the cached packages in /var/cache/apt/archives - assuming that it's only related to a kernel upgrade...

Revision history for this message
Jonas Hahnfeld (hahnjo) wrote :

That's weird:
[ 2691.471972] PANIC: double fault, error_code: 0x0
[ 2691.477488] Kernel panic - not syncing: Machine halted.
[ 2691.478724] CPU: 1 PID: 31425 Comm: cc1 Not tainted 4.4.0-98-generic #121-Ubuntu
[ 2691.479962] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 2691.481621] 0000000000000086 dbc9cc8a876ccdc2 ffff88013fc84e80 ffffffff813fb2c3
[ 2691.483041] ffffffff81cb9bfd ffff88013fc84f18 ffff88013fc84f08 ffffffff8118df77
[ 2691.484325] 0000000000000008 ffff88013fc84f18 ffff88013fc84eb0 dbc9cc8a876ccdc2
[ 2691.485654] Call Trace:
[ 2691.486077] <#DF> [<ffffffff813fb2c3>] dump_stack+0x63/0x90
[ 2691.487057] [<ffffffff8118df77>] panic+0xd3/0x215
[ 2691.487880] [<ffffffff81060d5d>] df_debug+0x2d/0x30
[ 2691.488677] [<ffffffff8102fb8c>] do_double_fault+0x7c/0xf0
[ 2691.489620] [<ffffffff81846238>] double_fault+0x28/0x30
[ 2691.490537] [<ffffffff81407d07>] ? clear_page_c_e+0x7/0x10
[ 2691.491487] <<EOE>> [<ffffffff811987f4>] ? get_page_from_freelist+0x454/0xa50
[ 2691.492736] [<ffffffff8119f491>] ? pagevec_lookup_tag+0x21/0x30
[ 2691.493743] [<ffffffff8119b8e3>] ? write_cache_pages+0x123/0x510
[ 2691.494756] [<ffffffff81199b79>] __alloc_pages_nodemask+0x159/0x2a0
[ 2691.495826] [<ffffffff811e4fad>] alloc_pages_vma+0xad/0x250
[ 2691.496775] [<ffffffff811bf54f>] wp_page_copy.isra.56+0x38f/0x550
[ 2691.497802] [<ffffffff81199b79>] ? __alloc_pages_nodemask+0x159/0x2a0
[ 2691.498890] [<ffffffff811c0ea8>] do_wp_page+0xd8/0x6c0
[ 2691.499881] [<ffffffff810b5403>] ? update_curr+0xe3/0x160
[ 2691.501148] [<ffffffff811c2734>] handle_mm_fault+0xcf4/0x1820
[ 2691.502373] [<ffffffff810bab5c>] ? set_next_entity+0x9c/0xb0
[ 2691.503540] [<ffffffff8106b577>] __do_page_fault+0x197/0x400
[ 2691.504769] [<ffffffff8106b847>] trace_do_page_fault+0x37/0xe0
[ 2691.505859] [<ffffffff81063f29>] do_async_page_fault+0x19/0x70
[ 2691.507364] [<ffffffff81846868>] async_page_fault+0x28/0x30
[ 2691.508921] Kernel Offset: disabled
[ 2691.518757] ---[ end Kernel panic - not syncing: Machine halted.

Any ideas?

Revision history for this message
Jonas Hahnfeld (hahnjo) wrote :

Any updates on this?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.