NUMA aware live migration failed when vCPU pin set
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
High
|
Artom Lifshitz | ||
| Train |
High
|
Dan Smith |
Bug Description
Description
===========
When vCPU pin policy is dedicated, the NUMA aware live migration may go failed.
Steps to reproduce
==================
1. Create two flavor: 2c2g.numa; 4c.4g.numa
(venv) [root@t1 ~]# openstack flavor show 2c2g.numa
+------
| Field | Value |
+------
| OS-FLV-
| OS-FLV-
| access_project_ids | None |
| disk | 1 |
| id | b4a2df98-
| name | 2c2g.numa |
| os-flavor-
| properties | hw:cpu_
| ram | 2048 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 2 |
+------
(venv) [root@t1 ~]# openstack flavor show 4c.4g.numa
+------
| Field | Value |
+------
| OS-FLV-
| OS-FLV-
| access_project_ids | None |
| disk | 1 |
| id | cf53f5ea-
| name | 4c.4g.numa |
| os-flavor-
| properties | hw:cpu_
| ram | 4096 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 4 |
+------
2. Create four instance (2c2g.numa * 2, 4c.4g.numa * 2)
3. Live migrate the instances one by one
4. After the four instances live migrate done, check the vCPU pin is correct (use 'virsh vcpupin [vm_id]')
5. If vCPU pin correct, continue to step 3.
Expected result
===============
The vCPU pin is correct
Actual result
=============
The vCPU pin not correct on compute node: t1.
(nova-libvirt)
Id Name State
-------
138 instance-00000012 running
139 instance-00000011 running
(nova-libvirt)
VCPU: CPU Affinity
-------
0: 0
1: 15
(nova-libvirt)
VCPU: CPU Affinity
-------
0: 0
1: 15
Environment
===========
Code version: master, 23 Sep
Three compute nodes:
t1: 16C, 24GB (2 NUMA nodes)
t2: 12C, 16GB (2 NUMA nodes)
t3: 8C, 12GB (2 NUMA nodes)
The image has no property.
Hypervisor: Libvirt + KVM
Storage: ceph
Networking_type: Neutron + OVS
Logs & Configs
==============
Please check the attachment to get log file.
ya.wang (ya.wang) wrote : | #1 |
tags: | added: numa |
Matt Riedemann (mriedem) wrote : | #2 |
Artom Lifshitz (notartom) wrote : | #3 |
Log analysis notes:
The XML was updated to pin both instances to CPUs 0 and 15, at very different times:
2019-09-24 14:16:14.195 6 DEBUG nova.virt.
<name>
<uuid>
[...]
<vcpupin vcpu="0" cpuset="0"/>
<vcpupin vcpu="1" cpuset="15"/>
2019-09-24 14:16:42.251 6 DEBUG nova.virt.
<name>
<uuid>
[...]
<vcpupin vcpu="0" cpuset="0"/>
<vcpupin vcpu="1" cpuset="15"/>
For the first live migration we create the claims and the NUMAMigrateInfo:
2019-09-24 14:16:08.747 6 DEBUG nova.compute.
2019-09-24 14:16:08.760 6 DEBUG nova.virt.
Same for the second live migration:
2019-09-24 14:16:35.853 6 DEBUG nova.compute.
2019-09-24 14:16:35.861 6 DEBUG nova.virt.
Both claimed host CPUs 0 and 15 - but how/why? What happened between those 2 claims? Going back in time, we see:
The second live migration's claim claims CPUs 0 and 15:
2019-09-24 14:16:34.290 6 DEBUG nova.virt.hardware [req-5aeb2f2d-
[...]
2019-09-24 14:16:34.295 6 DEBUG nova.virt.hardware [req-5aeb2f2d-
Artom Lifshitz (notartom) wrote : | #4 |
Figured it out:
When the update resources periodic task runs, it pulls migrations from the database using [1], which filters out migrations in 'accepted' status. Live migrations are created with an 'accepted' status by the conductor [2], and are only set to 'preparing' by the compute manager here [3], which happens after all the new NUMA-aware live migrations claims stuff. So there's a time window after the claim but before the migration has been set to 'preparing' during which, if the periodic resource update task kicks in, it will miss the migration, see that the instance is still on the source host according to the database, and free its resources from the destination.
[1] https:/
[2] https:/
[3] https:/
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | nobody → Artom Lifshitz (notartom) |
status: | New → In Progress |
Artom Lifshitz (notartom) wrote : | #6 |
Ya, could you retry your tests with [1] applied, to confirm whether it fixes the issue?
tags: | added: train-rc-potential |
Changed in nova: | |
importance: | Undecided → High |
Fix proposed to branch: stable/train
Review: https:/
no longer affects: | nova/train |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 6ec686c26b2c8b1
Author: Artom Lifshitz <email address hidden>
Date: Tue Sep 24 13:22:23 2019 -0400
Stop filtering out 'accepted' for in-progress migrations
Live migrations are created with an 'accepted' status. Resource claims
on the destination are done with the migration in 'accepted' status.
The status is set to 'preparing' a bit later, right before running
pre_
out by the database layer when getting in-progress migrations. Thus,
there's a time window after resource claims but before 'preparing'
during which resources have been claimed but the migration is not
considered in-progress by the database layer. During that window, the
instance's host is the source - that's only updated once the live
migration finishes. If the update available resources periodic task
runs during that window, it'll free the instance's resource from the
destination because neither the instance nor any of its in-progress
migrations are associated with the destination. This means that other
incoming instances are able to consume resources that should not be
available. This patch stops filtering out the 'accepted' status in the
database layer when retrieving in-progress migrations.
Change-Id: I4c56925ed35bc3
Closes-bug: 1845146
Changed in nova: | |
status: | In Progress → Fix Released |
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: stable/train
commit 45c2ba37bc21370
Author: Artom Lifshitz <email address hidden>
Date: Tue Sep 24 13:22:23 2019 -0400
Stop filtering out 'accepted' for in-progress migrations
Live migrations are created with an 'accepted' status. Resource claims
on the destination are done with the migration in 'accepted' status.
The status is set to 'preparing' a bit later, right before running
pre_
out by the database layer when getting in-progress migrations. Thus,
there's a time window after resource claims but before 'preparing'
during which resources have been claimed but the migration is not
considered in-progress by the database layer. During that window, the
instance's host is the source - that's only updated once the live
migration finishes. If the update available resources periodic task
runs during that window, it'll free the instance's resource from the
destination because neither the instance nor any of its in-progress
migrations are associated with the destination. This means that other
incoming instances are able to consume resources that should not be
available. This patch stops filtering out the 'accepted' status in the
database layer when retrieving in-progress migrations.
Change-Id: I4c56925ed35bc3
Closes-bug: 1845146
(cherry picked from commit 6ec686c26b2c8b1
Related fix proposed to branch: master
Review: https:/
This issue was fixed in the openstack/nova 20.0.0.0rc2 release candidate.
Reviewed: https:/
Committed: https:/
Submitter: Zuul
Branch: master
commit 32713a4fe885ee5
Author: Artom Lifshitz <email address hidden>
Date: Tue Oct 8 15:23:47 2019 -0400
NUMA LM: Add func test for bug 1845146
Bug 1845146 was caused by the update available resources periodic task
running during a small window in which the migration was in 'accepted'
but resource claims had been done. 'accepted' migrations were not
considered in progress before the fix for 1845146 merged as commit
6ec686c26b, which caused the periodic task to incorrectly free the
migration's resources from the destination. This patch adds a test
that triggers this race by wrapping around the compute manager's
live_
actually wrong in 6ec686c26b, as it talks about 'preparing') and
running the update available resources periodic task while the
migration is still in 'accepted'.
Related bug: 1845146
Change-Id: I78e79112a9c803
This may be a duplicate of bug 1829349.