multi-node test causes nova-compute to lockup

Bug #1462305 reported by John Garbutt on 2015-06-05
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Davanum Srinivas (DIMS)

Bug Description

Its not very clear whats going on here, but here is the symptom.

One of the nova-compute nodes appears to lock up:
It was just completing the termination of an instance:

This is also seen in the scheduler reporting the node as down:

On further inspection it seems like the other nova compute node had just started a migration:

We have had issues in the past where olso.locks can lead to deadlocks, its not totally clear if thats happening here. all the periodic tasks happen in the same greenlet, so you can stop them happening if you hold a lock in an RPC call thats being processed, etc. No idea if thats happening here though.

Changed in nova:
status: New → Incomplete
assignee: nobody → Joe Gordon (jogo)
tags: added: testing
Joe Gordon (jogo) wrote :

It looks like the delete operation is coming from tempest. But the command never finishes since the lock do_terminate_instance uses is never released

' Lock "e701630a-e0f0-4228-ac9b-475604ac3479" acquired by "do_terminate_instance"'

Joe Gordon (jogo) wrote :

Fingerprint: message:"has not been heard from in a while" AND tags:"screen-n-sch.txt" AND build_name:"check-tempest-dsvm-multinode-full"

Joe Gordon (jogo) wrote :

After looking into this further, looks this happens on either node in the multinode job, always ending in the same place (an error in delete causing nova-compute to hang).

John Garbutt (johngarbutt) wrote :

Making this high, because ti blocking making multi-node voting

Changed in nova:
importance: Undecided → High

I'm setting it to "in progress" because "jogo" is set as assignee. Or does the combination "incomplete" + "assignee" have a special meaning?

Changed in nova:
status: Incomplete → In Progress
Joe Gordon (jogo) wrote :

Attempted to run guru meditation report (by sending a SIGUSR1) to the hung nova-compute but it doesn't respond

Joe Gordon (jogo) wrote :

next step is to attach gdb and get a stacktrace

Joe Gordon (jogo) on 2015-08-26
Changed in nova:
assignee: Joe Gordon (jogo) → nobody
Changed in nova:
status: In Progress → Confirmed
assignee: nobody → Davanum Srinivas (DIMS) (dims-v)
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers