2014-10-24 11:54:41 |
Giulio Fidente |
bug |
|
|
added bug |
2014-10-24 11:54:53 |
Giulio Fidente |
bug task added |
|
tripleo |
|
2014-10-24 11:55:04 |
Giulio Fidente |
bug task added |
|
oslo.messaging |
|
2014-10-24 11:55:11 |
Giulio Fidente |
tripleo: importance |
Undecided |
High |
|
2014-10-24 11:59:24 |
Giulio Fidente |
description |
Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api.
The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration.
This was observed using Kombu 3.0.33 as well as 2.5.
Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging |
Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running.
Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api.
The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration.
This was observed using Kombu 3.0.33 as well as 2.5.
Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging |
|
2014-10-24 13:29:44 |
Giulio Fidente |
description |
Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running.
Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api.
The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration.
This was observed using Kombu 3.0.33 as well as 2.5.
Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging |
Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running.
Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api.
The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes.
This was observed using Kombu 3.0.33 as well as 2.5.
Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging |
|
2014-10-24 15:19:28 |
Giulio Fidente |
description |
Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are working Cinder nodes up and running.
Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without them noticing and as a result, being unable to receive updates from api.
The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes.
This was observed using Kombu 3.0.33 as well as 2.5.
Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging |
Volume operations, eg. create/delete, may remain stuck in 'scheduling' state even though there are more Cinder nodes up and running.
Problem is due to cinder-scheduler getting disconnected from the rabbit cluster without it noticing and as a result, being unable to receive updates from api.
The disconnection may happen following for example a reconfig of a rabbit node, the VIP moving to a different node when rabbit is load balanced, or even _during_ tripleo overcloud deployment due to rabbit cluster configuration changes.
This was observed using Kombu 3.0.33 as well as 2.5.
Use of some aggressive (low) kernel keepalive probes interval seems to improve the reliability but a more appropriate fix seems to be support for heartbeat in oslo.messaging |
|
2014-10-24 15:20:58 |
Ben Nemec |
tripleo: status |
New |
Triaged |
|
2014-10-26 22:59:18 |
Mike Perez |
cinder: status |
New |
Triaged |
|
2014-10-26 22:59:25 |
Mike Perez |
cinder: importance |
Undecided |
High |
|
2014-11-10 01:49:00 |
Koji Iida |
bug |
|
|
added subscriber Koji Iida |
2014-12-03 13:26:25 |
Mehdi Abaakouk |
oslo.messaging: status |
New |
Incomplete |
|
2014-12-05 12:25:15 |
Giulio Fidente |
bug |
|
|
added subscriber Jan Provaznik |
2014-12-17 18:00:59 |
OpenStack Infra |
tripleo: status |
Triaged |
In Progress |
|
2014-12-17 18:00:59 |
OpenStack Infra |
tripleo: assignee |
|
Giulio Fidente (gfidente) |
|
2014-12-18 21:31:08 |
OpenStack Infra |
tripleo: status |
In Progress |
Fix Committed |
|
2014-12-24 10:01:12 |
Derek Higgins |
tripleo: status |
Fix Committed |
Fix Released |
|
2015-03-26 20:26:57 |
Ivan Kolodyazhny |
cinder: assignee |
|
Ivan Kolodyazhny (e0ne) |
|
2015-06-18 18:17:28 |
Michal Dulko |
cinder: status |
Triaged |
Fix Released |
|
2018-12-04 22:24:07 |
Ben Nemec |
oslo.messaging: status |
Incomplete |
Fix Released |
|