Bionic kernel panics in the MAAS ephemeral environment

Bug #1742324 reported by Lee Trager
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Critical
Unassigned
linux (Ubuntu)
Expired
Critical
Unassigned

Bug Description

Bionic kernel panics in the MAAS ephemeral environment. This does not always happen during commissioning or testing but does happen every time when trying to enter rescue mode. The test system is a KVM machine using all VirtIO drivers running on a Bionic host system. Both systems are using the 4.13.0-17-generic kernel.

[ 13.774332] BUG: unable to handle kernel paging request at fffff672c3075a20
[ 13.775203] IP: kfree+0x53/0x160
[ 13.775665] PGD 0
[ 13.775666] P4D 0
[ 13.775965]
[ 13.776566] Oops: 0000 [#1] SMP
[ 13.777004] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi overlay btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ttm drm_kms_helper aesni_intel syscopyarea sysfillrect aes_x86_64 crypto_simd glue_helper cryptd sysimgblt fb_sys_fops drm psmouse virtio_blk virtio_net pata_acpi floppy
[ 13.782468] CPU: 0 PID: 696 Comm: lxd Not tainted 4.13.0-17-generic #20-Ubuntu
[ 13.783169] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 13.784028] task: ffff9afa30b62e80 task.stack: ffffc07640a50000
[ 13.784512] RIP: 0010:kfree+0x53/0x160
[ 13.784884] RSP: 0018:ffff9afa3ca03a18 EFLAGS: 00010286
[ 13.785339] RAX: 0000000000000000 RBX: fffff50a81d68280 RCX: 0000000000000002
[ 13.785983] RDX: 0000457c0321e918 RSI: 0000000000010080 RDI: 00006505c0000000
[ 13.786916] RBP: ffff9afa3ca03a30 R08: 000000000001f4c0 R09: ffffffff94bbb839
[ 13.787877] R10: fffff672c3075a00 R11: 00000000e4e5cd01 R12: ffff9afa3f794000
[ 13.788854] R13: ffffffff947a155e R14: ffff9afa3f794000 R15: ffff9afa3f794000
[ 13.789882] FS: 00007fc4a7938cc0(0000) GS:ffff9afa3ca00000(0000) knlGS:0000000000000000
[ 13.791134] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 13.792003] CR2: fffff672c3075a20 CR3: 00000000737f6000 CR4: 00000000001406f0
[ 13.793019] Call Trace:
[ 13.793462] <IRQ>
[ 13.793886] security_sk_free+0x3e/0x50
[ 13.794680] __sk_destruct+0x108/0x190
[ 13.795465] sk_destruct+0x20/0x30
[ 13.796193] __sk_free+0x82/0xa0
[ 13.796899] sk_free+0x19/0x20
[ 13.797537] sock_put+0x14/0x20
[ 13.797926] tcp_v4_rcv+0x94d/0x9d0
[ 13.798325] ? ep_poll_callback+0x226/0x2b0
[ 13.798704] ip_local_deliver_finish+0x5c/0x1f0
[ 13.799099] ip_local_deliver+0x6f/0xe0
[ 13.799455] ip_rcv_finish+0x120/0x410
[ 13.799868] ip_rcv+0x28c/0x3a0
[ 13.800221] ? csum_partial+0x11/0x20
[ 13.800649] __netif_receive_skb_core+0x39a/0xaa0
[ 13.801295] ? skb_checksum+0x35/0x50
[ 13.801742] ? skb_append_datato_frags+0x200/0x200
[ 13.802164] ? reqsk_fastopen_remove+0x140/0x140
[ 13.802576] __netif_receive_skb+0x18/0x60
[ 13.803163] ? __netif_receive_skb+0x18/0x60
[ 13.803798] netif_receive_skb_internal+0x3f/0x3f0
[ 13.804538] ? dev_gro_receive+0x2dc/0x480
[ 13.805210] napi_gro_receive+0xc2/0xe0
[ 13.805851] receive_buf+0x218/0xf70 [virtio_net]
[ 13.806591] ? vring_unmap_one+0x1b/0x80
[ 13.807240] virtnet_poll+0x173/0x268 [virtio_net]
[ 13.807998] net_rx_action+0x13b/0x380
[ 13.808621] ? skb_recv_done+0x30/0x40 [virtio_net]
[ 13.810159] __do_softirq+0xde/0x2a5
[ 13.810673] irq_exit+0xb6/0xc0
[ 13.811185] do_IRQ+0x80/0xd0
[ 13.811881] common_interrupt+0x89/0x89
[ 13.812668] RIP: 0010:filemap_fault+0xff/0x5e0
[ 13.819490] RSP: 0018:ffffc07640a53cc8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff8e
[ 13.820174] RAX: 000000000012e288 RBX: ffff9afa3f6e0800 RCX: 0000000000000054
[ 13.820760] RDX: 0000000000000000 RSI: ffff9afa30b64180 RDI: ffff9afa3ca1c200
[ 13.821339] RBP: ffffc07640a53d70 R08: 0000000000000310 R09: 0000000000000001
[ 13.821917] R10: ffffc076406ab738 R11: 00000000000000d2 R12: 00000000000000ea
[ 13.822493] R13: ffff9afa3262e900 R14: fffff50a8174eec0 R15: ffffc07640a53db8
[ 13.823075] </IRQ>
[ 13.823327] ? filemap_fault+0xb0/0x5e0
[ 13.823703] ? filemap_map_pages+0x179/0x320
[ 13.824101] __do_fault+0x1e/0xb0
[ 13.824434] __handle_mm_fault+0xba7/0x1020
[ 13.824821] handle_mm_fault+0xb1/0x200
[ 13.825189] __do_page_fault+0x24d/0x4d0
[ 13.825560] trace_do_page_fault+0x37/0xd0
[ 13.826051] do_async_page_fault+0x51/0x80
[ 13.826652] async_page_fault+0x28/0x30
[ 13.827246] RIP: 0033:0x7fc4a68e38a7
[ 13.827818] RSP: 002b:00007ffea4ae47e8 EFLAGS: 00010206
[ 13.828303] RAX: 0000000000000000 RBX: 00007fc4a4f610e0 RCX: 0000000000000018
[ 13.828883] RDX: 00007fc4a4d1d9b8 RSI: 00007fc4a7063190 RDI: 00007fc4a4d1d9a0
[ 13.829461] RBP: 00007fc4a4cf0fc0 R08: 00000000000000ca R09: 0000000000000001
[ 13.830060] R10: 0000000000000000 R11: 000000000000004d R12: 00007ffea4ae48a0
[ 13.830637] R13: 00007ffea4ae4980 R14: 00007fc4a4f4d030 R15: 00007fc4a4d1d9b8
[ 13.831227] Code: 00 80 49 01 da 0f 82 1c 01 00 00 48 c7 c7 00 00 00 80 48 2b 3d 7f e6 c1 00 49 01 fa 49 c1 ea 0c 49 c1 e2 06 4c 03 15 5d e6 c1 00 <49> 8b 42 20 48 8d 50 ff a8 01 4c 0f 45 d2 49 8b 52 20 48 8d 42
[ 13.832971] RIP: kfree+0x53/0x160 RSP: ffff9afa3ca03a18
[ 13.833525] CR2: fffff672c3075a20
[ 13.833920] ---[ end trace 1b0852e1ad381bc3 ]---
[ 13.834350] Kernel panic - not syncing: Fatal exception in interrupt
[ 13.835085] Kernel Offset: 0x13400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 13.836253] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Revision history for this message
Lee Trager (ltrager) wrote :
description: updated
Changed in maas:
status: New → Triaged
importance: Undecided → Critical
milestone: none → 2.4.0alpha1
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1742324

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful
Revision history for this message
Lee Trager (ltrager) wrote :

The kernel panic happens before I can access the system. The long log I am able to pull is the console log which is attached.

tags: added: bionic
removed: artful
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Are you able to install any test kernels on the system to perform some debugging?

Changed in linux (Ubuntu):
status: Confirmed → Triaged
importance: Undecided → Critical
tags: added: kernel-da-key
tags: added: kernel-key
removed: kernel-da-key
Revision history for this message
Blake Rouse (blake-rouse) wrote :

This kernel is PXE booting from MAAS. We might be able to modify the kernel on disk that pxelinux.0 pulls to get you more information.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest mainline kernel?

 http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc7

Revision history for this message
Lee Trager (ltrager) wrote :

4.15-rc7 works with MAAS. I've commissioned, tested, and entered rescue mode without hitting any kernel panics.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the update. We can perform a "Reverse" bisect to identify the commit that fixes this bug in mainline. We would first need to identify the last kernel version that had the bug and the first version that did not have the bug. We now know that v4.15-rc7 is fixed. Can you next test v4.15-rc1? It can be downloaded from:

 http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.15-rc1

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It might also be worthwhile to test the latest upstream 4.13 kernel:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13.16/

Lee Trager (ltrager)
Changed in maas:
status: Triaged → Won't Fix
tags: added: kernel-da-key
removed: kernel-key
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Changed in maas:
status: Won't Fix → Incomplete
status: Incomplete → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.