networking-dpm

RFE: Introduce Error handling for ConnectionError

Bug #1665401 reported by Andreas Scheuring on 2017-02-16

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
networking-dpm	New	High	Andreas Scheuring	networking-dpm pike-1 "p1"
nova-dpm	In Progress	High	Prabhat Ranjan	nova-dpm pike-1 "p1"
os-dpm	Invalid	Undecided	Unassigned

Bug Description

A ConnectionError can occur every time when interacting with the HMC. A concept on how to deal with such errors is needed.

The following is thinkable

- Retry mechanism - retries a certain command up to x times. x is configurable via config option
- Timeout mechanism - during this time (configureable), the error will be caught
  * The agent will be reported as down, but continues running until HMC is back or the timeout hit. Then it terminates
  * resources will be reported as up until the timeout hits. After that the agent gets terminated
  * resources will be reported as down until the timeout hits. After that the agent terminates

We need to carefully think which options make sense! Maybe all, maybe just one of them, maybe something totally different.

Tags:

Andreas Scheuring (andreas-scheuring) on 2017-02-16

Changed in os-dpm:
status:	New → Invalid

Markus Zoeller (markus_z) (mzoeller) on 2017-02-17

tags:

added: ocata-rc-potential

Andreas Scheuring (andreas-scheuring) on 2017-02-20

tags:

added: rfe

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2017-02-21:

Doesn't block the release, but we might want to backport it when it's done in Pike.

tags:

added: ocata-backport-potential
removed: ocata-rc-potential

Revision history for this message

Markus Zoeller (markus_z) (mzoeller) wrote on 2017-02-21:

That's a design we haven't talked about at the beginning. We need to do that early in Pike.

Changed in nova-dpm:
importance:	Undecided → High
Changed in networking-dpm:
importance:	Undecided → High

Andreas Scheuring (andreas-scheuring) on 2017-02-24

Changed in networking-dpm:
importance:	High → Wishlist

Sreeram Vancheeswaran (sreeram-vancheeswaran) on 2017-03-15

Changed in nova-dpm:
assignee:	nobody → Prabhat Ranjan (pranjank)

Marco Pavone (pavone) on 2017-04-03

Changed in networking-dpm:
importance:	Wishlist → High

Marco Pavone (pavone) on 2017-04-03

Changed in networking-dpm:
assignee:	nobody → Andreas Scheuring (andreas-scheuring)
milestone:	none → pike-1
Changed in nova-dpm:
milestone:	none → pike-1

Revision history for this message

Andreas Maier (maiera) wrote on 2017-04-28:

zhmcclient v0.11.0 introduced connection retries. So if the zhmcclient now raises a ConnectionError, that means these retries are exhausted. The connect retries and connect timeouts can be configured using a new RetryTimeoutConfig class. See http://python-zhmcclient.readthedocs.io/en/latest/general.html#session

Revision history for this message

Andreas Maier (maiera) wrote on 2017-04-28:

Also, the zhmcclient has always had support for automatic initial logon and re-logon upon expiration of an HMC session. We did not test whether the re-logon works across an HMC reboot, though.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-07-04: Related fix proposed to nova-dpm (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/480097

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-07-04: Fix proposed to nova-dpm (master)

Fix proposed to branch: master
Review: https://review.openstack.org/480158

Changed in nova-dpm:
status:	New → In Progress

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.