The instances which didn't be evacuated will be destroyed when the nova-compute service is restarted.

Bug #1583504 reported by alex on 2016-05-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned

Bug Description

Description
===========
Normally, if you finished evacuating instance, nova-compute will destroy the existed instances whose state are "done" or "accept" in the migration table from the fault host after the nova-compute service is restarted. However, if the nova-scheduler failed to schedule the hosts, e.g. no valid host, nova-scheduler won't update the migration state of the instance to "failed", so that the nova-compute will destroy this instance after restarting by mistake.

Steps to reproduce
==================
1. Create some instances in the the specific host.
2. Make this host to fault state.
3. Disable nova-compute service in other hosts, aim to mock that nova-scheduler fail to schedule.
4. Recover the fault host and restart nova-compute.
5. Check all instances are still existed in the kvm.

Expected result
===============
All instances should be still existed in the kvm.

Actual result
=============
All instances are destroyed by nova-compute unfortunately.

alex (chinabjalex) on 2016-05-19
Changed in nova:
assignee: nobody → alex (chinabjalex)
Sarafraj Singh (sarafraj-singh) wrote :

Can you please attach logs to this bug?

tags: added: compute
Changed in nova:
status: New → Incomplete
alex (chinabjalex) wrote :
Download full text (5.9 KiB)

nova-api.log

/var/log/nova/nova-api.log:43606:2016-05-30 14:15:24.424 3987 DEBUG nova.api.openstack.wsgi [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] Action: 'action', calling method: <bound method EvacuateController._evacuate of <nova.api.openstack.compute.evacuate.EvacuateController object at 0x639b310>>, body: {"evacuate": {"onSharedStorage": true}} _process_stack /usr/lib/python2.7/site-packages/nova/api/openstack/wsgi.py:789
/var/log/nova/nova-api.log:43607:2016-05-30 14:15:24.425 3987 DEBUG nova.compute.api [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] [instance: eb46eb30-08ae-4faa-a62e-b5f100bcddf5] Fetching instance by UUID get /usr/lib/python2.7/site-packages/nova/compute/api.py:2077
/var/log/nova/nova-api.log:43608:2016-05-30 14:15:24.462 3987 DEBUG nova.compute.api [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] [instance: eb46eb30-08ae-4faa-a62e-b5f100bcddf5] vm evacuation scheduled evacuate /usr/lib/python2.7/site-packages/nova/compute/api.py:3433
/var/log/nova/nova-api.log:43609:2016-05-30 14:15:24.465 3987 DEBUG nova.servicegroup.drivers.db [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] Seems service is down. Last heartbeat was 2016-05-30 06:14:08. Elapsed time is 76.465787 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:80
/var/log/nova/nova-api.log:43610:2016-05-30 14:15:24.515 3987 DEBUG nova.objects.instance [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] Lazy-loading `flavor' on Instance uuid eb46eb30-08ae-4faa-a62e-b5f100bcddf5 obj_load_attr /usr/lib/python2.7/site-packages/nova/objects/instance.py:884
/var/log/nova/nova-api.log:43611:2016-05-30 14:15:24.570 3987 DEBUG nova.objects.instance [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] Lazy-loading `flavor' on Instance uuid eb46eb30-08ae-4faa-a62e-b5f100bcddf5 obj_load_attr /usr/lib/python2.7/site-packages/nova/objects/instance.py:884
/var/log/nova/nova-api.log:43612:2016-05-30 14:15:24.617 3987 INFO nova.osapi_compute.wsgi.server [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] 172.16.197.31 "POST /v2/2c2643bad4d6489a99be97fd69c4fdcc/servers/eb46eb30-08ae-4faa-a62e-b5f100bcddf5/action HTTP/1.1" status: 200 len: 191 time: 0.1989682

nova-scheduler.log

/var/log/nova/nova-scheduler.log:9490:2016-05-30 14:15:24.642 1425 WARNING nova.scheduler.host_manager [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] No compute service record found for host compute2
/var/log/nova/nova-scheduler.log:9491:2016-05-30 14:15:24.642 1425 INFO nova.scheduler.host_manager [req-72eab156-cfd1-483b-beab-2a164361071c 9646cdbcb7644ee79d80b0b54a6fcebe 2c2643bad4d6489a99be97fd69c4fdcc - - -] Host filter ignoring hosts: compute1
/var/log/nova/nov...

Read more...

Changed in nova:
status: Incomplete → In Progress
alex (chinabjalex) wrote :

Below is the msg in migrations table from nova database.
| created_at | updated_at | deleted_at | id | source_compute | dest_compute | dest_host | status | instance_uuid | old_instance_type_id | new_instance_type_id | source_node | dest_node | deleted | migration_type | hidden |

after execute evacuate action.
| 2016-05-30 06:15:24 | NULL | NULL | 127 | compute1 | NULL | NULL | accepted | eb46eb30-08ae-4faa-a62e-b5f100bcddf5 | NULL | NULL | compute1 | NULL | 0 | evacuation | 0 |

after restart nova compute service on source node.
| 2016-05-30 06:15:24 | 2016-05-30 06:17:52 | NULL | 127 | compute1 | NULL | NULL | completed | eb46eb30-08ae-4faa-a62e-b5f100bcddf5 | NULL | NULL | compute1 | NULL | 0 | evacuation | 0 |

alex (chinabjalex) wrote :
Download full text (3.5 KiB)

Below is the instance detail. It's host is stilling compute1(the source node)
[root@controller ~]# nova show eb46eb30-08ae-4faa-a62e-b5f100bcddf5
+--------------------------------------+------------------------------------------------------------+
| Property | Value |
+--------------------------------------+------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | aaaaaaa |
| OS-EXT-SRV-ATTR:host | compute1 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute1 |
| OS-EXT-SRV-ATTR:instance_name | instance-0000003b |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | - |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2016-05-30T06:11:41.000000 |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| admin-net network | 192.168.0.61 |
| config_drive | True |
| created | 2016-05-30T02:53:37Z |
| flavor | mini (100000000) |
| hostId | da6fc182d7a4592413365df255c835eb99d233d8e1c479642caa7a9f |
| id | eb46eb30-08ae-4faa-a62e-b5f100bcddf5 |
| image | cirros-0.3.1-x86_32 (6e1c83af-e57e-44eb-8a7d-0296b5f464ef) |
| key_name | - |
| metadata | {"admin_pass": "a", "admin_user": "root"} |
| name | vm1 |
| os-extended-volumes:volumes_attached | [] |
| os_type | linux |
| progress | 0 |
| security_groups | default |
| status | ACTIVE ...

Read more...

alex (chinabjalex) on 2016-06-02
Changed in nova:
status: In Progress → New
Changed in nova:
status: New → In Progress

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/318731
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

Anusha Unnam (anusha-unnam) wrote :

Since there is no activity in this bug and the patch has been abandoned, removing the assignee and setting the status to new because this was not confirmed by anyone other than the reporter.

Changed in nova:
assignee: alex (chinabjalex) → nobody
status: In Progress → New
Anusha Unnam (anusha-unnam) wrote :

I tried to reproduce this bug and i think this is a valid bug. I followed below steps:

Multinode environment with controller and compute node,

!. boot an instance vm1 on compute node.
2. made the compute node to fault state.
3. disabled compute service on controller node.
4. evacaute vm1.
5. recover the fault host and restart the compute service.

As you said vm1 is deleted on compute node. I saw this message from n-cpu log
"[instance: 36663d00-fad8-468e-9a13-462b2bbfbd37] Deleting instance as it has been evacuated from this host"
"INFO nova.virt.libvirt.driver [-] [instance: 36663d00-fad8-468e-9a13-462b2bbfbd37] Instance destroyed successfully."

When i did "nova show vm1", it shows wrongly that it is still on compute node and task_state is 'active' and power_state is 'NOSTATE'.
So the vm is not running and it was not evacuated to controller node even after I enabled the compute service on controller node.

I read that nova evacuate is useful with shared storage. But in my environment i don't have shared storage. I see from your logs you have "onSharedStorage": true.

Changed in nova:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers