Kernel Crash when running on nested environment

Bug #1784393 reported by Salvador Fuentes Garcia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

When running an Ubuntu 17.10 or 18.04 VM and then running a Kata-Container (which creates a new VM), sometimes I get a kernel Panic in the First VM launched.

I don't know what is the Host distro, since this is a cloud environment.

Using this script, it is easily reproducible:
https://gist.github.com/chavafg/d00fa4dbefb144e6b7eeceb5e1ad9c65

This is the kernel panic got from Ubuntu 17.10:

[ 7893.642642] general protection fault: 0000 [#1] SMP PTI
[ 7893.644919] Modules linked in: vhost_net vhost macvtap macvlan tap veth kvm_intel kvm irqbypass ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc overlay nls_iso8859_1 ppdev crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev input_leds serio_raw parport_pc parport ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid aesni_intel aes_x86_64 crypto_simd floppy cryptd glue_helper psmouse virtio_net virtio_blk
[ 7893.666649] [last unloaded: irqbypass]
[ 7893.667910] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-45-generic #50-Ubuntu
[ 7893.670283] Hardware name: RDO OpenStack Compute, BIOS 1.11.0-2.el7 04/01/2014
[ 7893.672483] task: ffffffffb0412480 task.stack: ffffffffb0400000
[ 7893.674218] RIP: 0010:native_write_cr4+0x4/0x10
[ 7893.675549] RSP: 0018:ffff8b08bfc03f50 EFLAGS: 00010006
[ 7893.677138] RAX: 00000000000626f0 RBX: 0000000000000000 RCX: ffff8b08bfc23cd0
[ 7893.679254] RDX: ffff8b08bfc14020 RSI: ffff8b08bfc23cd0 RDI: 00000000000606f0
[ 7893.681478] RBP: ffff8b08bfc03f50 R08: 00000a6d223ad8ee R09: ffff8b08ac931800
[ 7893.683695] R10: 00000001001cf7c1 R11: 0000000000000061 R12: 0000000000023cd0
[ 7893.685895] R13: 0000000000000001 R14: ffff8b08bff4a800 R15: 000000000008a000
[ 7893.688171] FS: 0000000000000000(0000) GS:ffff8b08bfc00000(0000) knlGS:0000000000000000
[ 7893.690855] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7893.692658] CR2: 00007fb2f8fdc020 CR3: 000000035b80a003 CR4: 00000000000626f0
[ 7893.694909] Call Trace:
[ 7893.695837] <IRQ>
[ 7893.696664] hardware_disable+0x99/0xb0 [kvm_intel]
[ 7893.698380] kvm_arch_hardware_disable+0x19/0x40 [kvm]
[ 7893.700043] hardware_disable_nolock+0x2b/0x30 [kvm]
[ 7893.701712] flush_smp_call_function_queue+0x5c/0x100
[ 7893.703335] generic_smp_call_function_single_interrupt+0x13/0x30
[ 7893.705321] smp_call_function_interrupt+0x2d/0x40
[ 7893.706883] call_function_interrupt+0x1af/0x1c0
[ 7893.708529] </IRQ>
[ 7893.709361] RIP: 0010:native_safe_halt+0x6/0x10
[ 7893.710878] RSP: 0018:ffffffffb0403df8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff03
[ 7893.713311] RAX: ffffffffafb1be10 RBX: ffffffffb0660720 RCX: 0000000000000000
[ 7893.715635] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 7893.717890] RBP: ffffffffb0403df8 R08: 0000000000000002 R09: ffff8b08ac931800
[ 7893.720044] R10: 00000001001cf7c1 R11: 0000000000000061 R12: 0000000000000000
[ 7893.722234] R13: 0000000000000000 R14: ffff8b08bff4a800 R15: 000000000008a000
[ 7893.724426] ? __cpuidle_text_start+0x8/0x8
[ 7893.725828] default_idle+0x20/0x100
[ 7893.727209] arch_cpu_idle+0x15/0x20
[ 7893.728450] default_idle_call+0x23/0x30
[ 7893.729613] do_idle+0x16f/0x1e0
[ 7893.730643] cpu_startup_entry+0x73/0x80
[ 7893.731825] rest_init+0xbc/0xc0
[ 7893.732840] start_kernel+0x4c8/0x4e9
[ 7893.733974] ? early_idt_handler_array+0x120/0x120
[ 7893.735306] x86_64_start_reservations+0x24/0x26
[ 7893.736607] x86_64_start_kernel+0x13a/0x15d
[ 7893.737887] secondary_startup_64+0x9f/0xa0
[ 7893.739139] Code: 0f 1f 80 00 00 00 00 55 48 89 e5 0f 20 d8 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 0f 22 df 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 <0f> 22 e7 5d c3 0f 1f 80 00 00 00 00 55 48 89 e5 44 0f 20 c0 5d
[ 7893.744104] RIP: native_write_cr4+0x4/0x10 RSP: ffff8b08bfc03f50
[ 7893.745775] ---[ end trace fccf997ea9a59d36 ]---
[ 7893.747097] Kernel panic - not syncing: Fatal exception in interrupt
[ 7893.749271] Kernel Offset: 0x2e200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 7893.752179] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1784393

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: artful
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.18-rc7

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Salvador Fuentes Garcia (fuentess) wrote :

Hi Joseph, thanks for your response.

The issue started to happen since we started to run on this cloud, which was around 2 months ago more or less. So I am not sure if before that the issue was present.

This kernel should be handled in the baremetal host, right?
If so, I'll try to see if our cloud provider could try it.

Thank you, regards.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.