OpenStack Compute (nova)

Comment 3 for bug 1845146

Revision history for this message

Artom Lifshitz (notartom) wrote on 2019-09-24:

Log analysis notes:

The XML was updated to pin both instances to CPUs 0 and 15, at very different times:

2019-09-24 14:16:14.195 6 DEBUG nova.virt.libvirt.migration [-] _update_numa_xml output xml=<domain type="kvm">
  <name>instance-00000012</name>
  <uuid>17bcf040-cf68-4ac3-b365-8a77f93af85b</uuid>
[...]
    <vcpupin vcpu="0" cpuset="0"/>
    <vcpupin vcpu="1" cpuset="15"/>

2019-09-24 14:16:42.251 6 DEBUG nova.virt.libvirt.migration [-] _update_numa_xml output xml=<domain type="kvm">
  <name>instance-00000011</name>
  <uuid>f1929d75-d6ac-45af-b54b-0e10be75d155</uuid>
[...]
    <vcpupin vcpu="0" cpuset="0"/>
    <vcpupin vcpu="1" cpuset="15"/>

For the first live migration we create the claims and the NUMAMigrateInfo:

2019-09-24 14:16:08.747 6 DEBUG nova.compute.manager [req-9c210c32-614c-4040-abc8-e8a4138d885b f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] [instance: 17bcf040-cf68-4ac3-b365-8a77f93af85b] Created live migration claim. _live_migration_claim /var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py:6659

2019-09-24 14:16:08.760 6 DEBUG nova.virt.libvirt.driver [req-9c210c32-614c-4040-abc8-e8a4138d885b f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] Built NUMA live migration info: LibvirtLiveMigrateNUMAInfo(cell_pins={0=set([0]),1=set([1])},cpu_pins={0=set([0]),1=set([15])},emulator_pins=set([0,15]),sched_priority=<?>,sched_vcpus=<?>) _get_live_migrate_numa_info /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:8059

Same for the second live migration:

2019-09-24 14:16:35.853 6 DEBUG nova.compute.manager [req-5aeb2f2d-c69f-473a-8da4-59664d87a214 f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] [instance: f1929d75-d6ac-45af-b54b-0e10be75d155] Created live migration claim. _live_migration_claim /var/lib/kolla/venv/lib/python2.7/site-packages/nova/compute/manager.py:6659

2019-09-24 14:16:35.861 6 DEBUG nova.virt.libvirt.driver [req-5aeb2f2d-c69f-473a-8da4-59664d87a214 f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] Built NUMA live migration info: LibvirtLiveMigrateNUMAInfo(cell_pins={0=set([0]),1=set([1])},cpu_pins={0=set([0]),1=set([15])},emulator_pins=set([0,15]),sched_priority=<?>,sched_vcpus=<?>) _get_live_migrate_numa_info /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:8059

Both claimed host CPUs 0 and 15 - but how/why? What happened between those 2 claims? Going back in time, we see:

The second live migration's claim claims CPUs 0 and 15:

2019-09-24 14:16:34.290 6 DEBUG nova.virt.hardware [req-5aeb2f2d-c69f-473a-8da4-59664d87a214 f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] Selected cores for pinning: [(0, 0)], in cell 0 _pack_instance_onto_cores /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/hardware.py:979

[...]

2019-09-24 14:16:34.295 6 DEBUG nova.virt.hardware [req-5aeb2f2d-c69f-473a-8da4-59664d87a214 f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] Selected cores for pinning: [(1, 15)], in cell 1 _pack_instance_onto_cores /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/hardware.py:979

[...]

2019-09-24 14:16:34.296 6 INFO nova.compute.claims [req-5aeb2f2d-c69f-473a-8da4-59664d87a214 f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] [instance: f1929d75-d6ac-45af-b54b-0e10be75d155] Claim successful on node t1

Before that, the first live migration's claim claims CPUs 0 and 15 as well:

2019-09-24 14:16:07.217 6 DEBUG nova.virt.hardware [req-9c210c32-614c-4040-abc8-e8a4138d885b f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] Selected cores for pinning: [(0, 0)], in cell 0 _pack_instance_onto_cores /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/hardware.py:979

[...]

2019-09-24 14:16:07.223 6 DEBUG nova.virt.hardware [req-9c210c32-614c-4040-abc8-e8a4138d885b f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] Selected cores for pinning: [(1, 15)], in cell 1 _pack_instance_onto_cores /var/lib/kolla/venv/lib/python2.7/site-packages/nova/virt/hardware.py:979

[...]

2019-09-24 14:16:07.224 6 INFO nova.compute.claims [req-9c210c32-614c-4040-abc8-e8a4138d885b f28f4213c3a14121b6f7fa15140e7aef 25696a57055b4d6bb5428f45f0473a8c - default default] [instance: 17bcf040-cf68-4ac3-b365-8a77f93af85b] Claim successful on node t1

There's one interesting bit between those two claims:

2019-09-24 14:16:09.418 6 DEBUG oslo_concurrency.lockutils [req-a02e1e22-b8bf-418e-a135-85a7374d5b3a - - - - -] Lock "compute_resources" acquired by "nova.compute.resource_tracker._update_available_resource" :: waited 0.000s inner /var/lib/kolla/venv/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:327

[...]

2019-09-24 14:16:09.682 6 WARNING nova.compute.resource_tracker [req-a02e1e22-b8bf-418e-a135-85a7374d5b3a - - - - -] Instance 17bcf040-cf68-4ac3-b365-8a77f93af85b has been moved to another host t2(t2). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'VCPU': 2, u'MEMORY_MB': 2048, u'DISK_GB': 1}}.

Looks like the periodic task kicked in before the migration updated the instance's host, and so freed that instance's resources. The question is - and incoming migration should be consuming those, why did they get freed?

Log analysis notes:

The XML was updated to pin both instances to CPUs 0 and 15, at very different times:

For the first live migration we create the claims and the NUMAMigrateInfo:

Same for the second live migration:

Both claimed host CPUs 0 and 15 - but how/why? What happened between those 2 claims? Going back in time, we see:

The second live migration's claim claims CPUs 0 and 15:

[...]

Before that, the first live migration's claim claims CPUs 0 and 15 as well:

[...]

There's one interesting bit between those two claims:

[...]