Mirantis OpenStack

Overview
Code
Bugs
Blueprints
Translations
Answers

Series 7.0.x
Bug #1371130
Comment #35

Comment 35 for bug 1371130

Revision history for this message

Pavel Boldin (pboldin) wrote on 2015-09-02:

#35

So, basically, the problem seems to be in the way QEMU dirties out the pages after the *bulk* migration of the RAM is done by a calls to `ram_save_iterate`.

The QEMU only dirties pages when there is *miss* to the TLB cache: the code, generated by `tcg_out_tlb_load` called from inside `tcg_out_qemu_st` then calls a `helper_le_st*_mmu` which invalidates the page calling `x86_cpu_handle_mmu_fault` which calls `stl_phys_notdirty`. The last call marks necessary bits in the ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION].

If the memory page changed was looked up in the TLB cache then it is not dirtied. This will mostly happen for the *most used* pages such as structures of the timer interrupt or `delay_tsc` and this is exactly were it happens in our case.

This is the reason for the different pages between source and destination of the migration.