tripleo

Bug #1385240
Activity log

Activity log for bug #1385240

Date	Who	What changed	Old value	New value	Message
2014-10-24 11:54:41	Giulio Fidente	bug			added bug
2014-10-24 11:54:53	Giulio Fidente	bug task added		tripleo
2014-10-24 11:55:04	Giulio Fidente	bug task added		oslo.messaging
2014-10-24 11:55:11	Giulio Fidente	tripleo: importance	Undecided	High
2014-10-24 11:59:24	Giulio Fidente	description	Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging	Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging
2014-10-24 13:29:44	Giulio Fidente	description	Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging	Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging
2014-10-24 15:19:28	Giulio Fidente	description	Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging	Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are more Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without it noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging
2014-10-24 15:20:58	Ben Nemec	tripleo: status	New	Triaged
2014-10-26 22:59:18	Mike Perez	cinder: status	New	Triaged
2014-10-26 22:59:25	Mike Perez	cinder: importance	Undecided	High
2014-11-10 01:49:00	Koji Iida	bug			added subscriber Koji Iida
2014-12-03 13:26:25	Mehdi Abaakouk	oslo.messaging: status	New	Incomplete
2014-12-05 12:25:15	Giulio Fidente	bug			added subscriber Jan Provaznik
2014-12-17 18:00:59	OpenStack Infra	tripleo: status	Triaged	In Progress
2014-12-17 18:00:59	OpenStack Infra	tripleo: assignee		Giulio Fidente (gfidente)
2014-12-18 21:31:08	OpenStack Infra	tripleo: status	In Progress	Fix Committed
2014-12-24 10:01:12	Derek Higgins	tripleo: status	Fix Committed	Fix Released
2015-03-26 20:26:57	Ivan Kolodyazhny	cinder: assignee		Ivan Kolodyazhny (e0ne)
2015-06-18 18:17:28	Michal Dulko	cinder: status	Triaged	Fix Released
2018-12-04 22:24:07	Ben Nemec	oslo.messaging: status	Incomplete	Fix Released