memory corruption with migrate/savevm in TCG mode

Bug #1497479 reported by Pavel Boldin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned

Bug Description

[ISSUE]

QEMU releases 2.3.1 and lower are forgetting to flush TLBs before enabling the global dirty pages log and entering the final stage of saving the VM.

[DESCRIPTION]

The situation is the following:
1. TLB misses is the only way for page dirtying in the TCG mode.
2. If TLB is always hit by a running VM code during the execution of the `ram_save_iterate' by migration thread then these pages are missing in the dirty log. The TLB is always hit for instance when the VM is mostly idling and the Kernel only handles APIC timer interrupts.
3. These pages are then missed during `ram_save_complete' stage.
4. This makes memory content in a saved VM state differ from the actual VM memory.
5. If the affected memory pages contain some Kernel data structures these can be corrupted by this memory inconsistency, causing Kernel to Oops after loading the saved state.

[SOLUTION]

A proposed solution is to flush TLB when `log_global_start' is called.
Here is the patch: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1493049/+attachment/4459905/+files/tcg-commit-on-log-global-start.patch

[LINKS]

Ubuntu bug: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1493049

Pavel Boldin (pboldin)
description: updated
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

is this a duplicate of 1493049? (Should they be merged?)

Revision history for this message
Pavel Boldin (pboldin) wrote :

Hi,

This one is for QEMU master, 1493049 is for Ubuntu packages.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1497479] Re: memory corruption with migrate/savevm in TCG mode

Generally combining them is still better - but if it helps you to
keep things straight then no problem, sorry for the noise - thanks.

Revision history for this message
Thomas Huth (th-huth) wrote :

Looking through old bug tickets... can you still reproduce this issue with the latest version of QEMU? Or could we close this ticket nowadays? If you still can reproduce the issue, please send your patch to the qemu-devel mailing list for discussion (we generally do not take patches from the bugtracker). See https://wiki.qemu.org/Contribute/SubmitAPatch for details.

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers