live-migration-permit-post-copy mode does not work

Bug #1950894 reported by Erlon R. Cruz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Compute Charm
Fix Released
Undecided
Erlon R. Cruz

Bug Description

Description
===========
Some customers have noted that some VMs never complete a
live migration. The VM's memory copy keeps oscillating
around 1-10% but never completes. After changing
live-migration-permit-post-copy = True, we expected this to
converge and migrate successfully as this feature describes it
should.

Workaround 1: It's possible to complete the process if you log into the source
host and run the QMP command[1]:

virsh qemu-monitor-command instance-00000026 '{"execute":"migrate-start-postcopy"}'

Workaround 2: The migration finishes if you run 'nova live-migration-force-complete'

I believe this can also be a libvirt bug given that I don't see any "migrate-start-postcopy"
coming from nova/libvirt logs[4], but only after I manually triggered it via the execute
command above, at 2021-11-12 19:14:08.053+0000[4].

Steps to reproduce
==================

* Set up an OpenStack deployment with live_migration_permit_post_copy=False
* Create a large VM (8+ CPUs) and install stress-ng
* Run stress-ng:
  nohup stress-ng --vm 4 --vm-bytes 10% --vm-method write64 --vm-addr-method pwr2 -t 1h &
* Migrate the VM, and check for the source host logs messages like:
  'Migration running for \d+ secs, memory \d+% remaining'
  This should be oscillating like describing and migration not completing
* Complete or cancel the above migration, set live_migration_permit_post_copy=True,
  restart nova services on the computes, and re-do the operation

Expected result
===============
Migration should complete 100% of times

Actual result
=============
The migration does not complete and VM's memory is never copied.

Environment
===========
1. Exact version of OpenStack you are running[8]

21.2.1-0ubuntu1

2. Which hypervisor did you use[8]?

qemu-kvm: 4.2-3ubuntu6.18
libvirt-daemon: 6.0.0-0ubuntu8.14

2. Which storage type did you use?

Shared Ceph

3. Which networking type did you use?

OpenvSwitch L3HA

Logs & Configs
==============

[1] QMP Commands: https://gist.github.com/sombrafam/5e8e991058001c2b3843c0d08b4cd7d1
[2] Migration (completed manually with workaround 1) logs: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
[3] nova-compute.log src: https://gist.github.com/sombrafam/b74497150ae4ae32494ac5735189e149
[4] libvirt.log src: https://gist.github.com/sombrafam/69f05404d7097265140e1578ea50c00c
[5] Migration list: https://gist.github.com/sombrafam/39b72e242e27b6a3123603db1faa7b19
[6] Nova.conf dst host: https://gist.github.com/sombrafam/ad43b268e7f4b69e7da513a0f7a0095f
[7] Nova.conf src host: https://gist.github.com/sombrafam/ab27b40e577fbe56d741f01e811f3a18
[8] Package versions: https://gist.github.com/sombrafam/0622792d82750b2141b45580b625b69f
[9] VM info: https://gist.github.com/sombrafam/57eaa4c4ba4b141dec9659ee01f25b6d

affects: nova → charm-nova-compute
summary: - live_migration_permit_post_copy mode does not work
+ live-migration-permit-post-copy mode does not work
description: updated
Changed in charm-nova-compute:
assignee: nobody → Erlon R. Cruz (sombrafam)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (master)
Changed in charm-nova-compute:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/819890
Committed: https://opendev.org/openstack/charm-nova-compute/commit/4c4bc999e9f6d11e1306be254f2a76f827332d40
Submitter: "Zuul (22348)"
Branch: master

commit 4c4bc999e9f6d11e1306be254f2a76f827332d40
Author: Erlon R. Cruz <email address hidden>
Date: Fri Nov 26 15:33:02 2021 -0300

    Fixes Nova live-migration post-copy

    Live migration post-copy was not working because to be effective,
    'live_migration_timeout_action' must be set to 'force_complete'.

    Closes-bug: #1950894
    Change-Id: I66984a12b89cb0ac2aeebeb393a6f6c026d865da

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Felipe Reyes (freyes)
Changed in charm-nova-compute:
milestone: none → 22.04
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/841161

Changed in charm-nova-compute:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/841161
Committed: https://opendev.org/openstack/charm-nova-compute/commit/d04cbcea20915c7db6011c7e53cdc53613ee9744
Submitter: "Zuul (22348)"
Branch: stable/xena

commit d04cbcea20915c7db6011c7e53cdc53613ee9744
Author: Erlon R. Cruz <email address hidden>
Date: Fri Nov 26 15:33:02 2021 -0300

    Fixes Nova live-migration post-copy

    Live migration post-copy was not working because to be effective,
    'live_migration_timeout_action' must be set to 'force_complete'.

    Closes-bug: #1950894
    Change-Id: I66984a12b89cb0ac2aeebeb393a6f6c026d865da

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/21.10)

Fix proposed to branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/841338

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/845634

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-nova-compute (stable/21.10)

Change abandoned by "Erlon R. Cruz <email address hidden>" on branch: stable/21.10
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/841338
Reason: this is pointing to a wrong branch

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/845634
Committed: https://opendev.org/openstack/charm-nova-compute/commit/e2daf7121f3acc0b3d814cc8037de0a4a6c6d2c9
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit e2daf7121f3acc0b3d814cc8037de0a4a6c6d2c9
Author: Erlon R. Cruz <email address hidden>
Date: Fri Nov 26 15:33:02 2021 -0300

    Fixes Nova live-migration post-copy

    Live migration post-copy was not working because to be effective,
    'live_migration_timeout_action' must be set to 'force_complete'.

    Closes-bug: #1950894
    Change-Id: I66984a12b89cb0ac2aeebeb393a6f6c026d865da
    (cherry picked from commit d04cbcea20915c7db6011c7e53cdc53613ee9744)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/850017

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/850018

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/850018
Committed: https://opendev.org/openstack/charm-nova-compute/commit/152e19debdd7ebdb2cb9f7a8ad51de63606e18b4
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 152e19debdd7ebdb2cb9f7a8ad51de63606e18b4
Author: Erlon R. Cruz <email address hidden>
Date: Fri Nov 26 15:33:02 2021 -0300

    Fixes Nova live-migration post-copy

    Live migration post-copy was not working because to be effective,
    'live_migration_timeout_action' must be set to 'force_complete'.

    Closes-bug: #1950894
    Change-Id: I66984a12b89cb0ac2aeebeb393a6f6c026d865da
    (cherry picked from commit d04cbcea20915c7db6011c7e53cdc53613ee9744)
    (cherry picked from commit e2daf7121f3acc0b3d814cc8037de0a4a6c6d2c9)
    (cherry picked from commit 6fb73abcb8e73a8b66149d02a953a10781e01ddf)

tags: added: in-stable-ussuri
tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/850017
Committed: https://opendev.org/openstack/charm-nova-compute/commit/6fb73abcb8e73a8b66149d02a953a10781e01ddf
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 6fb73abcb8e73a8b66149d02a953a10781e01ddf
Author: Erlon R. Cruz <email address hidden>
Date: Fri Nov 26 15:33:02 2021 -0300

    Fixes Nova live-migration post-copy

    Live migration post-copy was not working because to be effective,
    'live_migration_timeout_action' must be set to 'force_complete'.

    Closes-bug: #1950894
    Change-Id: I66984a12b89cb0ac2aeebeb393a6f6c026d865da
    (cherry picked from commit d04cbcea20915c7db6011c7e53cdc53613ee9744)
    (cherry picked from commit e2daf7121f3acc0b3d814cc8037de0a4a6c6d2c9)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/charm-nova-compute/+/860888

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/train)

Reviewed: https://review.opendev.org/c/openstack/charm-nova-compute/+/860888
Committed: https://opendev.org/openstack/charm-nova-compute/commit/563064575ddfa09c7e2c837dba5efccb3a520f82
Submitter: "Zuul (22348)"
Branch: stable/train

commit 563064575ddfa09c7e2c837dba5efccb3a520f82
Author: Erlon R. Cruz <email address hidden>
Date: Fri Nov 26 15:33:02 2021 -0300

    Fixes Nova live-migration post-copy

    Live migration post-copy was not working because to be effective,
    'live_migration_timeout_action' must be set to 'force_complete'.

    Closes-bug: #1950894
    Change-Id: I66984a12b89cb0ac2aeebeb393a6f6c026d865da
    (cherry picked from commit d04cbcea20915c7db6011c7e53cdc53613ee9744)
    (cherry picked from commit e2daf7121f3acc0b3d814cc8037de0a4a6c6d2c9)
    (cherry picked from commit 6fb73abcb8e73a8b66149d02a953a10781e01ddf)
    (cherry picked from commit 152e19debdd7ebdb2cb9f7a8ad51de63606e18b4)

tags: added: in-stable-train
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.