IO stuck causes nova compute agent outage

Bug #1691131 reported by Marc Koderer on 2017-05-16
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Unassigned

Bug Description

Description:
============
Due to overload situation in our storage one NFS mount stucked.
All other mount points where accessible and working.
Deletion of a VM on this hypervisor was not possible since nova-compute wasn't reactive.

The agent was flagged as:
> nova-manage service list
nova-compute de4-2e-ff-0d-44-a4 nova enabled XXX 2017-05-16 11:49:00.577943

The nova-compute services scans over all attached volume paths (ephemeral and cinder).
In case of a single stale NFS mount will pause the whole agent.
With an inactive agent no operation are possible, even VM deletion.

Steps to reproduce:
===================

1.) Boot a VM
2.) Attach a volume
3.) Make the NFS backend inaccessible (e.g. using a drop iptable rule)

Marc Koderer (m-koderer) on 2017-05-17
summary: - NFS stale causes nova compute agent outage
+ IO stuck causes nova compute agent outage
description: updated
Changed in nova:
status: New → Confirmed

Fix proposed to branch: master
Review: https://review.openstack.org/465653

Changed in nova:
assignee: nobody → Daniel Gonzalez Nothnagel (dgonzalez)
status: Confirmed → In Progress

Change abandoned by Daniel Gonzalez Nothnagel (<email address hidden>) on branch: master
Review: https://review.openstack.org/465653
Reason: As discussed in the nova meeting, this is not the right fix for the problem.
Therefore I'm abandoning this patch.

Thanks Mate, Matthew and Matt for your comments here and in the nova meeting!

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → Confirmed
assignee: Daniel Gonzalez Nothnagel (dgonzalez) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers