Comment 4 for bug 1811259

Revision history for this message
Colin Ian King (colin-king) wrote :

Finally bisected:

git bisect good
86a559787e6f5cf662c081363f64a20cad654195 is the first bad commit
commit 86a559787e6f5cf662c081363f64a20cad654195
Author: Wei Wang <email address hidden>
Date: Mon Aug 27 09:32:17 2018 +0800

    virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

    Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
    support of reporting hints of guest free pages to host via virtio-balloon.
    Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
    obtained one by one from the mm free list via the regular allocation
    function.

    Host requests the guest to report free page hints by sending a new cmd id
    to the guest via the free_page_report_cmd_id configuration register. When
    the guest starts to report, it first sends a start cmd to host via the
    free page vq, which acks to host the cmd id received. When the guest
    finishes reporting free pages, a stop cmd is sent to host via the vq.
    Host may also send a stop cmd id to the guest to stop the reporting.

    VIRTIO_BALLOON_CMD_ID_STOP: Host sends this cmd to stop the guest
    reporting.
    VIRTIO_BALLOON_CMD_ID_DONE: Host sends this cmd to tell the guest that
    the reported pages are ready to be freed.

    Why does the guest free the reported pages when host tells it is ready to
    free?
    This is because freeing pages appears to be expensive for live migration.
    free_pages() dirties memory very quickly and makes the live migraion not
    converge in some cases. So it is good to delay the free_page operation
    when the migration is done, and host sends a command to guest about that.

    Why do we need the new VIRTIO_BALLOON_CMD_ID_DONE, instead of reusing
    VIRTIO_BALLOON_CMD_ID_STOP?
    This is because live migration is usually done in several rounds. At the
    end of each round, host needs to send a VIRTIO_BALLOON_CMD_ID_STOP cmd to
    the guest to stop (or say pause) the reporting. The guest resumes the
    reporting when it receives a new command id at the beginning of the next
    round. So we need a new cmd id to distinguish between "stop reporting" and
    "ready to free the reported pages".

    TODO:
    - Add a batch page allocation API to amortize the allocation overhead.

    Signed-off-by: Wei Wang <email address hidden>
    Signed-off-by: Liang Li <email address hidden>
    Cc: Michael S. Tsirkin <email address hidden>
    Cc: Michal Hocko <email address hidden>
    Cc: Andrew Morton <email address hidden>
    Cc: Linus Torvalds <email address hidden>
    Signed-off-by: Michael S. Tsirkin <email address hidden>