Health policy with node poll treats node as unhealthy if nova driver returns exception

Bug #1800038 reported by Duc Truong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
senlin
Fix Released
Undecided
Duc Truong

Bug Description

When using a health policy with node poll detection mode, the health manager queries nova for the current status of a node. If that nova query results in an exception (e.g. 503 error), the health manager treats the node as unhealthy and proceeds to recover it.

An error encountered when getting the server status from nova should not result in the node being marked as unhealthy. If the error occurred because nova was unreachable or down, then the next iteration of the timer will retry the query. Otherwise if the node is marked as unhealthy when nova-api is temporarily unreachable, it would unnecessarily recreate the node.

Duc Truong (dtruong)
Changed in senlin:
assignee: nobody → Duc Truong (dtruong)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to senlin (master)

Fix proposed to branch: master
Review: https://review.openstack.org/614885

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to senlin (master)

Reviewed: https://review.openstack.org/614885
Committed: https://git.openstack.org/cgit/openstack/senlin/commit/?id=52d8702274e2dbfdf2d1d7debd8237bda6deb035
Submitter: Zuul
Branch: master

commit 52d8702274e2dbfdf2d1d7debd8237bda6deb035
Author: Duc Truong <email address hidden>
Date: Wed Oct 31 18:01:31 2018 +0000

    Rework health check code

    * Modified health check for node poll status mode to treat a node as
      healthy if it encounters an error getting server status.
    * Simplified NodePollStatusHealthCheck code
    * Added do_healthcheck method separate from do_check to clearly show
      health check behaviour
    * Simplified NodePollUrlHealthCheck code by using tenacity
    * Added more log statements

    Change-Id: I76f0ef95067c81f123bf548c723e93d4cf9c2d49
    Closes-Bug: 1800038

Changed in senlin:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/senlin 7.0.0.0b1

This issue was fixed in the openstack/senlin 7.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.