Activity log for bug #1385240

Date Who What changed Old value New value Message
2014-10-24 11:54:41 Giulio Fidente bug added bug
2014-10-24 11:54:53 Giulio Fidente bug task added tripleo
2014-10-24 11:55:04 Giulio Fidente bug task added oslo.messaging
2014-10-24 11:55:11 Giulio Fidente tripleo: importance Undecided High
2014-10-24 11:59:24 Giulio Fidente description Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging
2014-10-24 13:29:44 Giulio Fidente description Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging
2014-10-24 15:19:28 Giulio Fidente description Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are more Cinder nodes up and running. Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without it noticing and as a result, being unable to receive updates from api. The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes. This was observed using Kombu 3.0.33 as well as 2.5. Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging
2014-10-24 15:20:58 Ben Nemec tripleo: status New Triaged
2014-10-26 22:59:18 Mike Perez cinder: status New Triaged
2014-10-26 22:59:25 Mike Perez cinder: importance Undecided High
2014-11-10 01:49:00 Koji Iida bug added subscriber Koji Iida
2014-12-03 13:26:25 Mehdi Abaakouk oslo.messaging: status New Incomplete
2014-12-05 12:25:15 Giulio Fidente bug added subscriber Jan Provaznik
2014-12-17 18:00:59 OpenStack Infra tripleo: status Triaged In Progress
2014-12-17 18:00:59 OpenStack Infra tripleo: assignee Giulio Fidente (gfidente)
2014-12-18 21:31:08 OpenStack Infra tripleo: status In Progress Fix Committed
2014-12-24 10:01:12 Derek Higgins tripleo: status Fix Committed Fix Released
2015-03-26 20:26:57 Ivan Kolodyazhny cinder: assignee Ivan Kolodyazhny (e0ne)
2015-06-18 18:17:28 Michal Dulko cinder: status Triaged Fix Released
2018-12-04 22:24:07 Ben Nemec oslo.messaging: status Incomplete Fix Released