periodic task for erroring build timeouts tries to set error state on deleted instances

Bug #1501556 reported by Sam Morrison
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
Low
Sam Morrison

Bug Description

In our nova-compute logs we get a ton of these messages over and over

2015-10-01 11:01:54.781 30811 WARNING nova.compute.manager [req-f61f4f85-72e7-481b-a8a3-90551bdc4b58 - - - - -] [instance: 75f733b5-842e-4bde-9570-efa2735e6f12] Instance build timed out. Set to error state.

Upon looking in the DB they are all deleted

select deleted_at, deleted, vm_state, task_state from instances where uuid = '75f733b5-842e-4bde-9570-efa2735e6f12';
+---------------------+---------+----------+------------+
| deleted_at | deleted | vm_state | task_state |
+---------------------+---------+----------+------------+
| 2015-08-17 00:47:18 | 12283 | building | deleting |
+---------------------+---------+----------+------------+

We have instance_build_timeout = 3600

I think _check_instance_build_time in compute.manager needs to filter on deleted instances but there may be a reason it checks deleted instances too.

Tags: compute
Revision history for this message
Hans Lindgren (hanlind) wrote :

Looks like vm_state is 'building' although it should be 'deleted' for a deleted instance.

tags: added: compute
Revision history for this message
jichenjc (jichenjc) wrote :

agree vm_state should be 'DELETED' , did someone operate the db directly?
otherwise task_state = deleteing and vm_state is building seems weird

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Chuck Carmack (chuckcarmack75) wrote :

It seems like Delete was called on the instance while it was in building state, and the instance was destroyed but not saved.

https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L2383

I think save was supposed to update the vm_state and task_state columns, while destroy was able to update the deleted_at column.

Changed in nova:
assignee: nobody → Pushkar Umaranikar (pushkar-umaranikar)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/240017

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
John Garbutt (johngarbutt) wrote :

So I think there is a bug for cells where instances get stuck in the deleting state for some time, and only eventually heal, so that is what is exposing this bug (I am guessing).

Revision history for this message
Sam Morrison (sorrison) wrote :

I don't think this is cells related. This is happening on the compute nodes on the local compute DB. It may be that cells causes instances to get into this state in the first place but the instance build timeouts code is all local to the compute node so cells shouldn't be taking a part here.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/240017
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Maciej Szankin (mszankin) wrote :

This bug report has an assignee for a while now but there is no patch
for that. It looks like that the chance of getting a patch is low.
I'm going to remove the assignee to signal to others that they can take
over if they like.
If you want to work on this, please:
* add yourself as assignee AND
* set the status to "In Progress" AND
* provide a (WIP) patch within the next 2 weeks after that.
If you need assistance, reach out on the IRC channel #openstack-nova or
use the mailing list.

Changed in nova:
status: In Progress → Confirmed
assignee: Pushkar Umaranikar (pushkar-umaranikar) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/880125

Changed in nova:
status: Confirmed → In Progress
Sam Morrison (sorrison)
Changed in nova:
assignee: nobody → Sam Morrison (sorrison)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.