RFE: Introduce Error handling for ConnectionError

Bug #1665401 reported by Andreas Scheuring
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-dpm
New
High
Andreas Scheuring
nova-dpm
In Progress
High
Prabhat Ranjan
os-dpm
Invalid
Undecided
Unassigned

Bug Description

A ConnectionError can occur every time when interacting with the HMC. A concept on how to deal with such errors is needed.

The following is thinkable

- Retry mechanism - retries a certain command up to x times. x is configurable via config option
- Timeout mechanism - during this time (configureable), the error will be caught
  * The agent will be reported as down, but continues running until HMC is back or the timeout hit. Then it terminates
  * resources will be reported as up until the timeout hits. After that the agent gets terminated
  * resources will be reported as down until the timeout hits. After that the agent terminates

We need to carefully think which options make sense! Maybe all, maybe just one of them, maybe something totally different.

Changed in os-dpm:
status: New → Invalid
tags: added: ocata-rc-potential
tags: added: rfe
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

Doesn't block the release, but we might want to backport it when it's done in Pike.

tags: added: ocata-backport-potential
removed: ocata-rc-potential
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

That's a design we haven't talked about at the beginning. We need to do that early in Pike.

Changed in nova-dpm:
importance: Undecided → High
Changed in networking-dpm:
importance: Undecided → High
Changed in networking-dpm:
importance: High → Wishlist
Changed in nova-dpm:
assignee: nobody → Prabhat Ranjan (pranjank)
Marco Pavone (pavone)
Changed in networking-dpm:
importance: Wishlist → High
Marco Pavone (pavone)
Changed in networking-dpm:
assignee: nobody → Andreas Scheuring (andreas-scheuring)
milestone: none → pike-1
Changed in nova-dpm:
milestone: none → pike-1
Revision history for this message
Andreas Maier (maiera) wrote :

zhmcclient v0.11.0 introduced connection retries. So if the zhmcclient now raises a ConnectionError, that means these retries are exhausted. The connect retries and connect timeouts can be configured using a new RetryTimeoutConfig class. See http://python-zhmcclient.readthedocs.io/en/latest/general.html#session

Revision history for this message
Andreas Maier (maiera) wrote :

Also, the zhmcclient has always had support for automatic initial logon and re-logon upon expiration of an HMC session. We did not test whether the re-logon works across an HMC reboot, though.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova-dpm (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/480097

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova-dpm (master)

Fix proposed to branch: master
Review: https://review.openstack.org/480158

Changed in nova-dpm:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers