Activity log for bug #1588901

Date Who What changed Old value New value Message
2016-06-03 16:20:48 Lucas Alvares Gomes bug added bug
2016-06-03 16:20:54 Lucas Alvares Gomes ironic: assignee Lucas Alvares Gomes (lucasagomes)
2016-06-03 16:23:04 Lucas Alvares Gomes description Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_state_updated_at" field in the nodes (we do have a "provision_state_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_state_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-03 16:25:42 Lucas Alvares Gomes description Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/ Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-03 16:26:50 Lucas Alvares Gomes description Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/ Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-03 16:27:22 Lucas Alvares Gomes ironic: importance Undecided High
2016-06-03 16:28:00 Lucas Alvares Gomes ironic: status New Confirmed
2016-06-07 14:20:37 Lucas Alvares Gomes description Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/ Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need *something* here. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-07 14:24:59 Lucas Alvares Gomes description Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need *something* here. Outputs showing the error ========================= http://paste.openstack.org/show/507728/ Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved by a conductor which is not currently online and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need *something* here. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-09 12:58:19 OpenStack Infra ironic: status Confirmed In Progress
2017-10-18 16:36:59 Ruby Loo ironic: status In Progress Triaged
2018-02-19 13:49:12 Dmitry Tantsur ironic: assignee Lucas Alvares Gomes (lucasagomes) Dmitry Tantsur (divius)
2018-02-20 19:06:09 OpenStack Infra ironic: status Triaged In Progress
2018-02-21 09:42:37 Olivier Bourdon bug added subscriber Olivier Bourdon
2018-03-16 20:47:07 OpenStack Infra ironic: status In Progress Fix Released
2018-03-30 21:13:01 OpenStack Infra tags in-stable-queens