2016-06-03 16:20:48 |
Lucas Alvares Gomes |
bug |
|
|
added bug |
2016-06-03 16:20:54 |
Lucas Alvares Gomes |
ironic: assignee |
|
Lucas Alvares Gomes (lucasagomes) |
|
2016-06-03 16:23:04 |
Lucas Alvares Gomes |
description |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_state_updated_at" field in the nodes (we do have a "provision_state_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_state_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
|
2016-06-03 16:25:42 |
Lucas Alvares Gomes |
description |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power state transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
|
2016-06-03 16:26:50 |
Lucas Alvares Gomes |
description |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power state transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
|
2016-06-03 16:27:22 |
Lucas Alvares Gomes |
ironic: importance |
Undecided |
High |
|
2016-06-03 16:28:00 |
Lucas Alvares Gomes |
ironic: status |
New |
Confirmed |
|
2016-06-07 14:20:37 |
Lucas Alvares Gomes |
description |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need *something* here.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
|
2016-06-07 14:24:59 |
Lucas Alvares Gomes |
description |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need *something* here.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments).
If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions.
Workaround(s)
=============
* While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks.
* Changing the database manually
Proposed solution
=================
Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state.
In order to implement that we would need:
1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change.
2. A periodic task that will query nodes that are reserved by a conductor which is not currently online and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not.
The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.
Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need *something* here.
Outputs showing the error
=========================
http://paste.openstack.org/show/507728/ |
|
2016-06-09 12:58:19 |
OpenStack Infra |
ironic: status |
Confirmed |
In Progress |
|
2017-10-18 16:36:59 |
Ruby Loo |
ironic: status |
In Progress |
Triaged |
|
2018-02-19 13:49:12 |
Dmitry Tantsur |
ironic: assignee |
Lucas Alvares Gomes (lucasagomes) |
Dmitry Tantsur (divius) |
|
2018-02-20 19:06:09 |
OpenStack Infra |
ironic: status |
Triaged |
In Progress |
|
2018-02-21 09:42:37 |
Olivier Bourdon |
bug |
|
|
added subscriber Olivier Bourdon |
2018-03-16 20:47:07 |
OpenStack Infra |
ironic: status |
In Progress |
Fix Released |
|
2018-03-30 21:13:01 |
OpenStack Infra |
tags |
|
in-stable-queens |
|