kernel panic in nested VM with latest 4.15 kernel (4.15.0-24-generic)

Bug #1780817 reported by Louis Bouchard on 2018-07-09
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Bionic
High
Unassigned

Bug Description

When starting a nested VM in a Bionic VM, the "host" VM kernel panics following a simple drop to QEMU monitor by hitting <Ctrl>A-c. For some reason kdump is unable to capture the kernel panic so I only have a screen capture of the panic.

It also happens on the latest mainline kernel(4.18-rc4). It is fairly trivial to reproduce. In a Bionic VM, install qemu & ovmf and run the following :

qemu-system-x86_64 -enable-kvm \
                    -name balloontest \
                    -display none \
                    -monitor none \
                    -nographic \
                    -nodefaults \
                    -m 2048M \
                    -serial mon:stdio \
                    -smp 2 \
                    -cpu host \
                    -drive if=pflash,format=raw,readonly,file=/usr/share/OVMF/OVMF_CODE.fd \
                    -drive if=pflash,format=raw,file=/home/caribou/balloon/efi.vars

Use <Ctrl>A-c to drop to QEMU monitor and <quit>.

^[]0;/opt/ocs/gotty-serial/nolp-cli^G^[]2;Loading...^G^[]2;avogadro^G[ 267.784299] general protection fault: 0000 [#1] SMP PTI
[ 267.785834] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter i>
[ 267.804312] xor raid6_pq libcrc32c raid1 raid0 multipath linear floppy aesni_intel pata_acpi aes_x86_64 crypto_simd cryptd glue_helper psmouse i2c_piix4 virtio_net virtio_blk
[ 267.807946] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.15.0-24-generic #26-Ubuntu
[ 267.809710] Hardware name: Scaleway Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 267.811666] RIP: 0010:native_write_cr4+0x4/0x10
[ 267.812727] RSP: 0018:ffff8c01ffd83f48 EFLAGS: 00010006
[ 267.813960] RAX: 00000000003626e0 RBX: 0000000000000046 RCX: ffff8c01ffd80000
[ 267.815582] RDX: ffff8c01ffd94020 RSI: ffff8c01ffda5040 RDI: 00000000003606e0
[ 267.817095] RBP: ffff8c01ffd83f48 R08: 000000478079a547 R09: 0000000000000000
[ 267.818625] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000025040
[ 267.820130] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 267.821638] FS: 0000000000000000(0000) GS:ffff8c01ffd80000(0000) knlGS:0000000000000000
[ 267.823352] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 267.824592] CR2: 0000000000000000 CR3: 000000058c00a006 CR4: 00000000003626e0
[ 267.826108] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 267.827567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 267.828973] Call Trace:
[ 267.829470] <IRQ>
[ 267.829893] hardware_disable+0xaa/0xc0 [kvm_intel]
[ 267.830897] kvm_arch_hardware_disable+0x19/0x40 [kvm]
[ 267.831928] hardware_disable_nolock+0x2b/0x30 [kvm]
[ 267.832912] flush_smp_call_function_queue+0x4c/0xf0
[ 267.833911] generic_smp_call_function_single_interrupt+0x13/0x30
[ 267.835121] smp_call_function_interrupt+0x36/0xd0
[ 267.836069] call_function_interrupt+0x84/0x90
[ 267.836950] </IRQ>
[ 267.837396] RIP: 0010:native_safe_halt+0x6/0x10
[ 267.838272] RSP: 0018:ffffa2e1431afe80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff03
[ 267.839698] RAX: ffffffffacd97150 RBX: 0000000000000006 RCX: 0000000000000000
[ 267.840978] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 267.842258] RBP: ffffa2e1431afe80 R08: 0000000000000002 R09: 0000000000000000
[ 267.843554] R10: 00000000000000b3 R11: 00000000000000a6 R12: 0000000000000006
[ 267.844826] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 267.846107] ? __cpuidle_text_start+0x8/0x8
[ 267.846875] default_idle+0x20/0x100
[ 267.847533] arch_cpu_idle+0x15/0x20
[ 267.848186] default_idle_call+0x23/0x30
[ 267.848917] do_idle+0x172/0x1f0
[ 267.849516] cpu_startup_entry+0x73/0x80
[ 267.850255] start_secondary+0x1ab/0x200
[ 267.850971] secondary_startup_64+0xa5/0xb0
[ 267.851700] Code: 0f 1f 80 00 00 00 00 55 48 89 e5 0f 20 d8 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 0f 22 df 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 <0f> 22 e7 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 44 0f 20 c0>
[ 267.854894] RIP: native_write_cr4+0x4/0x10 RSP: ffff8c01ffd83f48
[ 268.848104] invalid opcode: 0000 [#2] SMP PTI
[ 268.848524] Modules linked in: nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_filter i>
[ 268.854013] xor raid6_pq libcrc32c raid1 raid0 multipath linear floppy aesni_intel pata_acpi aes_x86_64 crypto_simd cryptd glue_helper psmouse i2c_piix4 virtio_net virtio_blk
[ 268.855212] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.15.0-24-generic #26-Ubuntu
[ 268.855790] Hardware name: Scaleway Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[ 268.856438] RIP: 0010:native_machine_crash_shutdown+0x136/0x190
[ 268.856893] RSP: 0018:ffff8c01ffd83cc0 EFLAGS: 00010002
[ 268.857295] RAX: 00000000003626e0 RBX: ffff8c01ffd83d28 RCX: 00000000fffa3223
[ 268.857837] RDX: 000000000f8bfbff RSI: ffff8c01ffd83cc8 RDI: ffff8c01ffd83cc4
[ 268.858426] RBP: ffff8c01ffd83cf0 R08: ffff8c01ffd83ccc R09: ffff8c01ffd83cd0
[ 268.859029] R10: ffffffffada5d660 R11: ffff8c01ffd83c7c R12: 000000000000000b
[ 268.859573] R13: ffff8c01ffd83e98 R14: 0000000000000000 R15: 0000000000000000
[ 268.860117] FS: 0000000000000000(0000) GS:ffff8c01ffd80000(0000) knlGS:0000000000000000
[ 268.860735] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 268.861177] CR2: 0000000000000000 CR3: 000000058c00a006 CR4: 00000000003626e0
[ 268.861724] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 268.862268] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 268.862820] Call Trace:
[ 268.863016] <IRQ>
[ 268.863181] kvm_crash_shutdown+0x26/0x50
[ 268.863520] machine_crash_shutdown+0x15/0x20
[ 268.863885] __crash_kexec+0x5d/0xa0
[ 268.864167] ? native_write_cr4+0x4/0x10
[ 268.864473] crash_kexec+0x41/0x60
[ 268.864739] oops_end+0xa8/0xd0
[ 268.864985] die+0x42/0x50
[ 268.865203] do_general_protection+0x9d/0x180
[ 268.865542] general_protection+0x25/0x50
[ 268.865855] RIP: 0010:native_write_cr4+0x4/0x10
[ 268.866204] RSP: 0018:ffff8c01ffd83f48 EFLAGS: 00010006
[ 268.866610] RAX: 00000000003626e0 RBX: 0000000000000046 RCX: ffff8c01ffd80000
[ 268.867152] RDX: ffff8c01ffd94020 RSI: ffff8c01ffda5040 RDI: 00000000003606e0
[ 268.867711] RBP: ffff8c01ffd83f48 R08: 000000478079a547 R09: 0000000000000000
[ 268.868294] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000025040
[ 268.868885] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[ 268.869433] hardware_disable+0xaa/0xc0 [kvm_intel]
[ 268.869825] kvm_arch_hardware_disable+0x19/0x40 [kvm]
[ 268.870231] hardware_disable_nolock+0x2b/0x30 [kvm]
[ 268.870620] flush_smp_call_function_queue+0x4c/0xf0
[ 268.871003] generic_smp_call_function_single_interrupt+0x13/0x30
[ 268.871470] smp_call_function_interrupt+0x36/0xd0
[ 268.871839] call_function_interrupt+0x84/0x90
[ 268.872181] </IRQ>
[ 268.872351] RIP: 0010:native_safe_halt+0x6/0x10
[ 268.872699] RSP: 0018:ffffa2e1431afe80 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff03
[ 268.873271] RAX: ffffffffacd97150 RBX: 0000000000000006 RCX: 0000000000000000
[ 268.873870] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 268.874414] RBP: ffffa2e1431afe80 R08: 0000000000000002 R09: 0000000000000000
[ 268.874965] R10: 00000000000000b3 R11: 00000000000000a6 R12: 0000000000000006
[ 268.875506] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 268.876048] ? __cpuidle_text_start+0x8/0x8
[ 268.876373] default_idle+0x20/0x100
[ 268.876654] arch_cpu_idle+0x15/0x20
[ 268.876934] default_idle_call+0x23/0x30
[ 268.877242] do_idle+0x172/0x1f0
[ 268.877496] cpu_startup_entry+0x73/0x80
[ 268.877804] start_secondary+0x1ab/0x200
[ 268.878133] secondary_startup_64+0xa5/0xb0
[ 268.878504] Code: 66 90 48 89 c6 48 c1 e8 20 4c 89 e7 81 e6 ff ef ff ff 48 89 c2 e8 6b 78 00 00 66 90 e9 65 ff ff ff e8 ff fc ff ff e9 01 ff ff ff <0f> 01 c4 9c 58 0f 1f 44 00 00 49 89 c4 fa 66 0f 1f 44 00 00>
[ 268.879993] RIP: native_machine_crash_shutdown+0x136/0x190 RSP: ffff8c01ffd83cc0
[ 268.880555] ---[ end trace 636e271a8cdb116f ]---
[ 268.880912] Kernel panic - not syncing: Fatal exception in interrupt
[ 269.951244] Shutting down cpus with NMI
[ 269.962128] Kernel Offset: 0x2b400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 269.963488] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1780817

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a kernel version where you were not having this particular problem? This will help determine if the problem you are seeing is the result of a regression, and when this regression was introduced. If this is a regression, we can perform a kernel bisect to identify the commit that introduced the problem.

Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged
Changed in linux (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → High
tags: added: kernel-da-key

those image also related to xenial.
apt-cache policy linux-image-virtual-hwe-16.04
linux-image-virtual-hwe-16.04:
  Installed: 4.15.0.33.55
  Candidate: 4.15.0.33.55
  Version table:
 *** 4.15.0.33.55 500
        500 http://ua.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages
        100 /var/lib/dpkg/status

Probably, it should be also marked to xenial ?

I can confirm this, the issue showed up in 4.14. I have tried a number kernels from http://kernel.ubuntu.com/~kernel-ppa/mainline/, and found that somewhere between 4.13.16 and 4.14.0 this bug introduced itself.

I am running a fresh install of Bionic, not an upgrade.

Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.19 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19-rc3

tags: added: needs-bisect

Tested with v4.19-rc4 mainline build, and the bug still exists.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu Bionic):
status: Triaged → Confirmed
Changed in linux (Ubuntu Bionic):
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers