IO stuck causes nova compute agent outage

Bug #1691131 reported by Marc Koderer
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Undecided
Unassigned

Bug Description

Description:
============
Due to overload situation in our storage one NFS mount stucked.
All other mount points where accessible and working.
Deletion of a VM on this hypervisor was not possible since nova-compute wasn't reactive.

The agent was flagged as:
> nova-manage service list
nova-compute de4-2e-ff-0d-44-a4 nova enabled XXX 2017-05-16 11:49:00.577943

The nova-compute services scans over all attached volume paths (ephemeral and cinder).
In case of a single stale NFS mount will pause the whole agent.
With an inactive agent no operation are possible, even VM deletion.

Steps to reproduce:
===================

1.) Boot a VM
2.) Attach a volume
3.) Make the NFS backend inaccessible (e.g. using a drop iptable rule)

Marc Koderer (m-koderer)
summary: - NFS stale causes nova compute agent outage
+ IO stuck causes nova compute agent outage
description: updated
Changed in nova:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/465653

Changed in nova:
assignee: nobody → Daniel Gonzalez Nothnagel (dgonzalez)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Daniel Gonzalez Nothnagel (<email address hidden>) on branch: master
Review: https://review.openstack.org/465653
Reason: As discussed in the nova meeting, this is not the right fix for the problem.
Therefore I'm abandoning this patch.

Thanks Mate, Matthew and Matt for your comments here and in the nova meeting!

Revision history for this message
Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: Daniel Gonzalez Nothnagel (dgonzalez) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.