nova live migration failed in some case

Bug #1678577 reported by Jeffrey Zhang
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
New
Undecided
Unassigned

Bug Description

env: nova 15.0.2 + libvirt + kvm + centos

in some situation, nova request spec become

{"nova_object.version": "1.8", "nova_object.changes": ["instance_uuid", "requested_destination", "retry", "num_instances", "pci_requests", "limits", "availability_zone", "force_nodes", "image", "instance_group", "force_hosts", "numa_topology", "flavor", "project_id", "scheduler_hints", "ignore_hosts"], "nova_object.name": "RequestSpec", "nova_object.data": {"requested_destination": null, "instance_uuid": "ca01b22b-d2d4-4291-96bd-ff6111f1f88b", "retry": {"nova_object.version": "1.1", "nova_object.changes": ["num_attempts", "hosts"], "nova_object.name": "SchedulerRetries", "nova_object.data": {"num_attempts": 1, "hosts": {"nova_object.version": "1.16", "nova_object.changes": ["objects"], "nova_object.name": "ComputeNodeList", "nova_object.data": {"objects": [{"nova_object.version": "1.16", "nova_object.changes": ["host", "hypervisor_hostname"], "nova_object.name": "ComputeNode", "nova_object.data": {"host": "control01", "hypervisor_hostname": "control01"}, "nova_object.namespace": "nova"}]}, "nova_object.namespace": "nova"}}, "nova_object.namespace": "nova"}, "num_instances": 1, "pci_requests": {"nova_object.version": "1.1", "nova_object.name": "InstancePCIRequests", "nova_object.data": {"instance_uuid": "ca01b22b-d2d4-4291-96bd-ff6111f1f88b", "requests": []}, "nova_object.namespace": "nova"}, "limits": {"nova_object.version": "1.0", "nova_object.changes": ["memory_mb", "vcpu", "disk_gb", "numa_topology"], "nova_object.name": "SchedulerLimits", "nova_object.data": {"vcpu": null, "memory_mb": 245427, "disk_gb": 8371, "numa_topology": null}, "nova_object.namespace": "nova"}, "availability_zone": null, "force_nodes": null, "image": {"nova_object.version": "1.8", "nova_object.changes": ["min_disk", "container_format", "min_ram", "disk_format", "properties"], "nova_object.name": "ImageMeta", "nova_object.data": {"min_disk": 1, "container_format": "bare", "min_ram": 0, "disk_format": "raw", "properties": {"nova_object.version": "1.16", "nova_object.name": "ImageMetaProps", "nova_object.data": {}, "nova_object.namespace": "nova"}}, "nova_object.namespace": "nova"}, "instance_group": null, "force_hosts": null, "numa_topology": null, "ignore_hosts": null, "flavor": {"nova_object.version": "1.1", "nova_object.name": "Flavor", "nova_object.data": {"disabled": false, "root_gb": 1, "name": "m1.tiny", "flavorid": "a70249ef-5ea9-49cb-b35f-ab4732064981", "deleted": false, "created_at": "2017-03-22T08:13:48Z", "ephemeral_gb": 0, "updated_at": null, "memory_mb": 256, "vcpus": 1, "extra_specs": {}, "swap": 0, "rxtx_factor": 1.0, "is_public": true, "deleted_at": null, "vcpu_weight": 0, "id": 119}, "nova_object.namespace": "nova"}, "project_id": "f3c6d500b267432c858c588800b49653", "scheduler_hints": {}}, "nova_object.namespace": "nova"}

check the retry part

retry": {"nova_object.version": "1.1", "nova_object.changes": ["num_attempts", "hosts"], "nova_object.name": "SchedulerRetries", "nova_object.data": {"num_attempts": 1, "hosts": {"nova_object.version": "1.16", "nova_object.changes": ["objects"], "nova_object.name": "ComputeNodeList", "nova_object.data": {"objects": [{"nova_object.version": "1.16", "nova_object.changes": ["host", "hypervisor_hostname"], "nova_object.name": "ComputeNode", "nova_object.data": {"host": "control01", "hypervisor_hostname": "control01"}, "nova_object.namespace": "nova"}]}

it has control01 as host even it is in control02

when live migrate this vm from controll02 to control01, get error in "migration-list", after check the nova-scheduler logs, got

2017-04-02 14:01:47.010 6 DEBUG nova.filters [req-191c8f6e-010b-42f6-acc6-c84c689f649c 2442cfcb9d5c4daf8d90af8bcfe30df7 8eb03bbcdfd84f68b88a7fbaa74e2327 - - -] Starting with 1 host(s) get_filtered_objects /var/lib/kolla/venv/lib/python2.7/site-packages/nova/filters.py:70
2017-04-02 14:01:47.010 6 INFO nova.scheduler.filters.retry_filter [req-191c8f6e-010b-42f6-acc6-c84c689f649c 2442cfcb9d5c4daf8d90af8bcfe30df7 8eb03bbcdfd84f68b88a7fbaa74e2327 - - -] Host [u'control01', u'control01'] fails. Previously tried hosts: [[u'control01', u'control01']]

I think the root cause is the retry part, and still do not know how it happen.

Revision history for this message
Jeffrey Zhang (jeffrey4l) wrote :
tags: added: live-migration
Revision history for this message
Sean Dague (sdague) wrote :

This isn't really actionable as is

Changed in nova:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
Revision history for this message
Vladislav Belogrudov (vlad-belogrudov) wrote :

looks like this is dealt in

https://review.openstack.org/#/c/505771/

I could get the same error with initial cold migration and then tried to live migrate an instance back with AUTO_SCHEDULE (only two computes in my test)

Changed in nova:
status: Expired → New
Revision history for this message
Matt Riedemann (mriedem) wrote :

What is AUTO_SCHEDULE? Otherwise yeah this looks like a duplicate of bug 1718512.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.