Comment 25 for bug 1719045

Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Thanks for all the testing, Josh. I reverted the paravirtualized TLB flushing patches in the test kernel. Do you think we should spin a new kernel without it while we try to find the main cause of the problem?

I enabled the tracepoint available in arch/x86/hyperv/mmu.c for both the mainline and linux-azure kernel and I got some interesting information. The mainline kernel never does a call to flush_tlb_others passing TLB_FLUSH_ALL while in the 4.11 and 4.13 linux-azure kernels that is done very ofter.

I'm attaching the tracing files for both kernels. You can check that TLB_FLUSH_ALL is given to flush_tlb_others when `end` is equal to "ffffffffffffffff" (-1ULL).

Also if I force hyperv_flush_tlb_others_ex() to do a native flush when end is equal to TLB_FLUSH_ALL the problem does not occur. That is another alternative for a temporary fix.

I believe the mainline kernel is carrying the same bug as linux-azure but the problematic path (end == TLB_FLUSH_ALL) is not being executed.