multi-node test causes nova-compute to lockup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
High
|
Davanum Srinivas (DIMS) |
Bug Description
Its not very clear whats going on here, but here is the symptom.
One of the nova-compute nodes appears to lock up:
http://
It was just completing the termination of an instance:
http://
This is also seen in the scheduler reporting the node as down:
http://
On further inspection it seems like the other nova compute node had just started a migration:
http://
We have had issues in the past where olso.locks can lead to deadlocks, its not totally clear if thats happening here. all the periodic tasks happen in the same greenlet, so you can stop them happening if you hold a lock in an RPC call thats being processed, etc. No idea if thats happening here though.
Changed in nova: | |
status: | New → Incomplete |
assignee: | nobody → Joe Gordon (jogo) |
tags: | added: testing |
Changed in nova: | |
assignee: | Joe Gordon (jogo) → nobody |
Changed in nova: | |
status: | In Progress → Confirmed |
assignee: | nobody → Davanum Srinivas (DIMS) (dims-v) |
status: | Confirmed → Invalid |
It looks like the delete operation is coming from tempest. But the command never finishes since the lock do_terminate_ instance uses is never released
' Lock "e701630a- e0f0-4228- ac9b-475604ac34 79" acquired by "do_terminate_ instance" '
http:// logs.openstack. org/67/ 175067/ 2/check/ check-tempest- dsvm-multinode- full/7a95fb0/ logs/screen- n-cpu.txt. gz#_2015- 05-29_23_ 27_47_445