resource_tracker keeps complaining instance not found for deleted instances

Bug #1490855 reported by Zhenzan Zhou
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Eli Qiao

Bug Description

Some instances met error during resize or live-migration, then resource_tracker keeps complaining instance not found even after they got deleted.

===== n-cpu.log =====
647686:2015-09-01 07:26:27.172 DEBUG nova.compute.resource_tracker [req-36bf129b-5ed3-403d-9d88-143b0cc3810e None None] Migration instance not found: Instance 0a130eb8-01b5-4ad7-bffd-64abe360e459 could not be found.
647712:2015-09-01 07:26:27.224 DEBUG nova.compute.resource_tracker [req-36bf129b-5ed3-403d-9d88-143b0cc3810e None None] Migration instance not found: Instance 57ec3d85-fd77-4b9a-b4f4-8910e892b612 could not be found.
647738:2015-09-01 07:26:27.277 DEBUG nova.compute.resource_tracker [req-36bf129b-5ed3-403d-9d88-143b0cc3810e None None] Migration instance not found: Instance 7a54fd3b-fee4-4b40-9074-adb515f7fc50 could not be found.

====== db ========
ubuntu@node-01:/opt/stack/nova$ nova list
+----+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+----+------+--------+------------+-------------+----------+
+----+------+--------+------------+-------------+----------+

mysql> select created_at,deleted_at,uuid from instances;
+---------------------+---------------------+--------------------------------------+
| created_at | deleted_at | uuid |
+---------------------+---------------------+--------------------------------------+
| 2015-08-28 06:03:36 | 2015-08-31 06:51:18 | 57ec3d85-fd77-4b9a-b4f4-8910e892b612 |
| 2015-08-28 06:04:30 | 2015-08-31 06:51:12 | 0a130eb8-01b5-4ad7-bffd-64abe360e459 |
| 2015-08-28 06:10:56 | 2015-08-28 06:11:14 | 8217c85d-6e74-48b6-940d-7eeb6f7afddd |
| 2015-08-28 06:12:26 | 2015-08-28 06:12:36 | 347218bb-166c-4228-8737-2b7baf310f21 |
| 2015-08-28 06:12:54 | 2015-08-28 06:13:09 | 8815d09a-3b07-4156-a451-d6182ace8b61 |
| 2015-08-28 06:13:27 | 2015-08-28 06:13:39 | 50c40182-ee2f-4553-9a40-828a45ba7026 |
| 2015-08-31 06:54:58 | 2015-09-01 07:10:50 | 7a54fd3b-fee4-4b40-9074-adb515f7fc50 |
| 2015-08-31 07:45:16 | 2015-09-01 07:10:40 | 0b141529-2127-4559-bf88-75de321631cd |
| 2015-09-01 07:12:40 | 2015-09-01 07:14:33 | 99363eb8-682a-4110-ae0b-93e4e7b7c4e9 |
+---------------------+---------------------+--------------------------------------+
9 rows in set (0.00 sec)

mysql> select created_at,deleted_at,status,migration_type,instance_uuid from migrations;
+---------------------+------------+-----------+----------------+--------------------------------------+
| created_at | deleted_at | status | migration_type | instance_uuid |
+---------------------+------------+-----------+----------------+--------------------------------------+
| 2015-08-28 08:31:10 | NULL | error | live-migration | 0a130eb8-01b5-4ad7-bffd-64abe360e459 |
| 2015-08-31 05:58:49 | NULL | confirmed | resize | 0a130eb8-01b5-4ad7-bffd-64abe360e459 |
| 2015-08-31 06:36:06 | NULL | error | resize | 0a130eb8-01b5-4ad7-bffd-64abe360e459 |
| 2015-08-31 06:40:56 | NULL | migrating | resize | 0a130eb8-01b5-4ad7-bffd-64abe360e459 |
| 2015-08-31 06:49:08 | NULL | migrating | resize | 57ec3d85-fd77-4b9a-b4f4-8910e892b612 |
| 2015-08-31 06:56:56 | NULL | migrating | resize | 7a54fd3b-fee4-4b40-9074-adb515f7fc50 |
| 2015-08-31 07:07:07 | NULL | confirmed | resize | 7a54fd3b-fee4-4b40-9074-adb515f7fc50 |
| 2015-08-31 07:47:30 | NULL | confirmed | resize | 0b141529-2127-4559-bf88-75de321631cd |
| 2015-08-31 07:50:14 | NULL | error | live-migration | 0b141529-2127-4559-bf88-75de321631cd |
| 2015-08-31 07:52:36 | NULL | completed | live-migration | 0b141529-2127-4559-bf88-75de321631cd |
| 2015-09-01 07:14:21 | NULL | error | resize | 99363eb8-682a-4110-ae0b-93e4e7b7c4e9 |
+---------------------+------------+-----------+----------------+--------------------------------------+
11 rows in set (0.00 sec)

Revision history for this message
Eli Qiao (taget-9) wrote :

hi zhenzhan,

I looked at the code of resource_tracker, actually this is a periodic task.

it will query all migration object (status not in [confirmed', 'reverted', 'error']) from nova database and update their status.

in this case, I think that you started the migration (nova-compute update the migration object status to migrating), but you delete
that instance when it's state is 'migrating', so nova-compute don't have change to update migration object to error status.

my idea is that when doing deleting, we should also consider the migration status, update it to a proper status.

Eli.

Changed in nova:
status: New → Confirmed
assignee: nobody → Eli Qiao (taget-9)
Revision history for this message
Eli Qiao (taget-9) wrote :

An alternative thinking is when catch exception.InstanceNotFound, we set the migration status to error to avoid RT query it again next periodic time slot.

Changed in nova:
importance: Undecided → Low
Revision history for this message
Eli Qiao (taget-9) wrote :

another thing need to be considered is we may need to revert quota if deleting an instance which task_state is in resizing

seems much more complex!

Revision history for this message
Pawel Koniszewski (pawel-koniszewski) wrote :

Eli, I believe that this fix https://review.openstack.org/#/c/185958/ should solve problems with instances deleted during live migration.

tags: added: live-migrate
Paul Murray (pmurray)
tags: added: live-migration
removed: live-migrate
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.