vm related hang on Eoan 5.2.0-7-generic

Bug #1835072 reported by Colin Ian King
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Medium
Colin Ian King

Bug Description

When exercising several VMs that are large enough to start causing swapping I occasionally get the following hang:

[21751.030907] INFO: task CPU 2/KVM:16977 blocked for more than 120 seconds.
[21751.031268] Tainted: P OE 5.2.0-7-generic #8
[21751.031624] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[21751.032005] CPU 2/KVM D 0 16977 1 0x00000100
[21751.032007] Call Trace:
[21751.032014] __schedule+0x2ba/0x650
[21751.032016] schedule+0x31/0xa0
[21751.032018] rwsem_down_read_failed+0xec/0x180
[21751.032021] ? hrtimer_try_to_cancel+0x2e/0x110
[21751.032023] down_read+0x45/0x50
[21751.032049] kvm_host_page_size+0x42/0x90 [kvm]
[21751.032064] mapping_level+0x60/0x130 [kvm]
[21751.032078] tdp_page_fault+0xb6/0x280 [kvm]
[21751.032084] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032088] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032102] kvm_mmu_page_fault+0x79/0x640 [kvm]
[21751.032106] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032109] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032112] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032115] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032118] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032121] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032124] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032127] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032130] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032133] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032136] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032139] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032142] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032145] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032148] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032151] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
[21751.032154] ? vmx_vmexit+0xf/0x30 [kvm_intel]
[21751.032157] handle_ept_violation+0xcf/0x220 [kvm_intel]
[21751.032161] vmx_handle_exit+0xab/0x7d0 [kvm_intel]
[21751.032174] vcpu_enter_guest+0x2db/0x1500 [kvm]
[21751.032178] ? skip_emulated_instruction+0x75/0xb0 [kvm_intel]
[21751.032191] ? kvm_skip_emulated_instruction+0x59/0xa0 [kvm]
[21751.032205] kvm_arch_vcpu_ioctl_run+0xd5/0x580 [kvm]
[21751.032217] kvm_vcpu_ioctl+0x24f/0x620 [kvm]
[21751.032219] ? __seccomp_filter+0x7e/0x6c0
[21751.032221] ? __switch_to_asm+0x40/0x70
[21751.032222] ? __switch_to_asm+0x34/0x70
[21751.032224] ? __switch_to_asm+0x40/0x70
[21751.032225] ? __switch_to_asm+0x34/0x70
[21751.032227] ? __switch_to_asm+0x40/0x70
[21751.032228] ? __switch_to_asm+0x34/0x70
[21751.032230] ? __switch_to_asm+0x40/0x70
[21751.032232] ? __switch_to_asm+0x34/0x70
[21751.032234] do_vfs_ioctl+0xad/0x640
[21751.032236] ? __secure_computing+0x42/0xd0
[21751.032238] ksys_ioctl+0x6b/0xa0
[21751.032239] __x64_sys_ioctl+0x1e/0x30
[21751.032242] do_syscall_64+0x5e/0x140
[21751.032244] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[21751.032245] RIP: 0033:0x7f64a3fb0417
[21751.032250] Code: Bad RIP value.
[21751.032251] RSP: 002b:00007f64937fd578 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[21751.032252] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f64a3fb0417
[21751.032253] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000014
[21751.032254] RBP: 0000000000000000 R08: 00005589927145d0 R09: 000000003b9aca00
[21751.032255] R10: 0000000000000001 R11: 0000000000000246 R12: 00005589935ff710
[21751.032255] R13: 00007f64a199d000 R14: 0000000000000000 R15: 00005589935ff710

Tags: cscc
Revision history for this message
Colin Ian King (colin-king) wrote :

Saving a running VM machine state seemed to trigger this issue, but it may be just co-incidence.

Changed in linux (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
status: New → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :
Download full text (4.0 KiB)

Running a similar setup on a ext4 backing store can also trigger ooms w/o the hangs, where as the first hang occurred when running on a ZFS backing store.

Jul 2 19:18:52 mahobay kernel: [ 1579.086409] Call Trace:
Jul 2 19:18:52 mahobay kernel: [ 1579.086415] dump_stack+0x67/0x86
Jul 2 19:18:52 mahobay kernel: [ 1579.086417] dump_header+0x54/0x2fb
Jul 2 19:18:52 mahobay kernel: [ 1579.086419] ? cpuset_mems_allowed_intersects+0x25/0x30
Jul 2 19:18:52 mahobay kernel: [ 1579.086421] oom_kill_process.cold.31+0xb/0x10
Jul 2 19:18:52 mahobay kernel: [ 1579.086422] out_of_memory+0x1c0/0x480
Jul 2 19:18:52 mahobay kernel: [ 1579.086424] __alloc_pages_slowpath+0xb6d/0xeb0
Jul 2 19:18:52 mahobay kernel: [ 1579.086427] __alloc_pages_nodemask+0x2e3/0x330
Jul 2 19:18:52 mahobay kernel: [ 1579.086429] alloc_pages_vma+0x7e/0x1d0
Jul 2 19:18:52 mahobay kernel: [ 1579.086431] __handle_mm_fault+0x93d/0x1240
Jul 2 19:18:52 mahobay kernel: [ 1579.086433] handle_mm_fault+0xc9/0x1f0
Jul 2 19:18:52 mahobay kernel: [ 1579.086435] __get_user_pages+0x248/0x720
Jul 2 19:18:52 mahobay kernel: [ 1579.086437] get_user_pages_unlocked+0x156/0x1f0
Jul 2 19:18:52 mahobay kernel: [ 1579.086456] __gfn_to_pfn_memslot+0x12e/0x400 [kvm]
Jul 2 19:18:52 mahobay kernel: [ 1579.086471] try_async_pf+0x89/0x250 [kvm]
Jul 2 19:18:52 mahobay kernel: [ 1579.086486] tdp_page_fault+0x140/0x280 [kvm]
Jul 2 19:18:52 mahobay kernel: [ 1579.086491] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:52 mahobay kernel: [ 1579.086505] kvm_mmu_page_fault+0x79/0x640 [kvm]
Jul 2 19:18:52 mahobay kernel: [ 1579.086509] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086512] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086515] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086518] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086522] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086525] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086528] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086532] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086535] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086538] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086542] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086545] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086548] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086551] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086554] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086557] ? vmx_vmexit+0x1b/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086560] ? vmx_vmexit+0xf/0x30 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086564] handle_ept_violation+0xcf/0x220 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086567] vmx_handle_exit+0xab/0x7d0 [kvm_intel]
Jul 2 19:18:53 mahobay kernel: [ 1579.086581] ...

Read more...

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
Colin Ian King (colin-king) wrote :

Fresh install, running 5.2.0-8-generic, can't trip this anymore. Closing this bug report.

Changed in linux (Ubuntu):
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.