If evacuation fails, periodic server state poll can loop forever

Bug #1897888 reported by Mark Goddard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
masakari
Fix Released
Undecided
Mark Goddard
Train
Fix Released
Undecided
Unassigned
Ussuri
Fix Released
Undecided
Unassigned
Victoria
Fix Released
Undecided
Unassigned
Wallaby
Fix Released
Undecided
Mark Goddard

Bug Description

Steps to reproduce
==================

Trigger a host failure of a node with instances running on it.

Cause evacuation to fail for some reason. In my case this was caused by using volume encryption, which fails with evacuation since the user used by masakari to trigger evacuation does not have read access to the volume's encryption key in barbican [1].

Expected results
================

Masakari detects the evacuation failure and aborts the failover.

Actual results
==============

The periodic looping call to wait for evacuation (_wait_for_evacuation_confirmation) polls for 90 seconds, then times out. After this point the main thread continues, but the periodic looping call continues to run forever. We see the following log:

Call get server command for instance <UUID>

Environment
===========
Kolla Ansible
Train
CentOS 8

[1] https://bugs.launchpad.net/nova/+bug/1895848

Revision history for this message
Mark Goddard (mgoddard) wrote :

Possibly this bug relates to the failure to detect evacuation failover: https://bugs.launchpad.net/masakari/+bug/1859406. Here we will cover the lack of looping call termination.

Mark Goddard (mgoddard)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to masakari (master)

Fix proposed to branch: master
Review: https://review.opendev.org/755283

Changed in masakari:
assignee: nobody → Mark Goddard (mgoddard)
status: New → In Progress
suzhengwei (sue.sam)
Changed in masakari:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/masakari 9.1.1

This issue was fixed in the openstack/masakari 9.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/masakari 10.0.1

This issue was fixed in the openstack/masakari 10.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/masakari 8.1.1

This issue was fixed in the openstack/masakari 8.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/masakari 11.0.0.0rc1

This issue was fixed in the openstack/masakari 11.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.