Ironic

Bug #1588901
Activity log

Activity log for bug #1588901

Date	Who	What changed	Old value	New value	Message
2016-06-03 16:20:48	Lucas Alvares Gomes	bug			added bug
2016-06-03 16:20:54	Lucas Alvares Gomes	ironic: assignee		Lucas Alvares Gomes (lucasagomes)
2016-06-03 16:23:04	Lucas Alvares Gomes	description	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_state_updated_at" field in the nodes (we do have a "provision_state_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_state_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option.	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-03 16:25:42	Lucas Alvares Gomes	description	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-03 16:26:50	Lucas Alvares Gomes	description	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the reservation and target_power_state set indefinitely. Currently, Ironic does not have have any periodic task that would reset the reservation and target_power_state on that situation. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-03 16:27:22	Lucas Alvares Gomes	ironic: importance	Undecided	High
2016-06-03 16:28:00	Lucas Alvares Gomes	ironic: status	New	Confirmed
2016-06-07 14:20:37	Lucas Alvares Gomes	description	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (*WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Outputs showing the error ========================= http://paste.openstack.org/show/507728/	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need something* here. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-07 14:24:59	Lucas Alvares Gomes	description	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need something* here. Outputs showing the error ========================= http://paste.openstack.org/show/507728/	Reported internally at https://bugzilla.redhat.com/show_bug.cgi?id=1342581 (see comments). If the conductor managing a node dies mid power state transition, that node will have the "reservation" and "target_power_state" fields set indefinitely because Ironic does not have a mechanism (periodic task) to time out nodes based on power state transitions. Workaround(s) ============= * While not ideal, operators can (re)start a conductor service with the same hostname that was managing that node and it will clean up the locks. * Changing the database manually Proposed solution ================= Just like we do for certain provision states (WAIT), we should have a periodic task that would check for a timeout on power state. In order to implement that we would need: 1. A "power_updated_at" field in the nodes (we do have a "provision_updated_at") which will have the time of the last power state change. 2. A periodic task that will query nodes that are reserved by a conductor which is not currently online and have the target_power_state field set, based on the value of the "power_updated_at" field we will know whether it's timed out or not. The number of seconds/minutes that we should wait for a timeout should be configurable as a config option. Warn: a possible problem here, how does one conductor cleans the reservation from another conductor? We may need something* here. Outputs showing the error ========================= http://paste.openstack.org/show/507728/
2016-06-09 12:58:19	OpenStack Infra	ironic: status	Confirmed	In Progress
2017-10-18 16:36:59	Ruby Loo	ironic: status	In Progress	Triaged
2018-02-19 13:49:12	Dmitry Tantsur	ironic: assignee	Lucas Alvares Gomes (lucasagomes)	Dmitry Tantsur (divius)
2018-02-20 19:06:09	OpenStack Infra	ironic: status	Triaged	In Progress
2018-02-21 09:42:37	Olivier Bourdon	bug			added subscriber Olivier Bourdon
2018-03-16 20:47:07	OpenStack Infra	ironic: status	In Progress	Fix Released
2018-03-30 21:13:01	OpenStack Infra	tags		in-stable-queens