Ubuntu
linux package

Comment 34 for bug 1413540

Revision history for this message

Chris J Arges (arges) wrote on 2015-03-25:

#34

I've added instructions for a workaround. The code paths I've seen in crashes has been the following:

kvm_sched_in
-> kvm_arch_vcpu_load
  -> vmx_vcpu_load
   -> loaded_vmcs_clear
    -> smp_call_function_single

pmdp_clear_flush
-> flush_tlb_mm_range
-> native_flush_tlb_others
-> smp_call_function_many

Generally this has been caused by workloads that use nested VMs, and stress L2/L1 vms (causing non-local CPU TLB flushing or VMCS clearing).

The hang is in csd_lock_wait waiting for CSD_FLAG_LOCK bit to be cleared, which can only be triggered with non-local smp_call_function_* calls.

Another data point is that this can happen with x2apic as well as flat apic (as tested with nox2apic).