We use independent RabbitMQ clusters for each OpenStack project, Nova Cells and also for notifications. Recently, I noticed in our test infrastructure that if the RabbitMQ cluster for notifications has an outage, Nova can't create new instances. Possibly other operations will also hang.
Not being able to send a notification/connect to the RabbitMQ cluster shouldn't stop new instances to be created. (If this is actually an use-case for some deployments, the operator should have the possibility to configure it.)
Tested against the master branch.
If the notification RabbitMQ is stooped, when creating an instance, nova-scheduler is stuck with:
We use independent RabbitMQ clusters for each OpenStack project, Nova Cells and also for notifications. Recently, I noticed in our test infrastructure that if the RabbitMQ cluster for notifications has an outage, Nova can't create new instances. Possibly other operations will also hang.
Not being able to send a notification/ connect to the RabbitMQ cluster shouldn't stop new instances to be created. (If this is actually an use-case for some deployments, the operator should have the possibility to configure it.)
Tested against the master branch.
If the notification RabbitMQ is stooped, when creating an instance, nova-scheduler is stuck with:
``` 18384]: DEBUG nova.scheduler. request_ filter [None req-353318d1- f4bd-499d- 98db-a0919d28ec f7 demo demo] Request filter 'accelerators_ filter' took 0.0 seconds {{(pid=18384) wrapper /opt/stack/ nova/nova/ scheduler/ request_ filter. py:46}} 18384]: ERROR oslo.messaging. _drivers. impl_rabbit [None req-353318d1- f4bd-499d- 98db-a0919d28ec f7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH 18384]: ERROR oslo.messaging. _drivers. impl_rabbit [None req-353318d1- f4bd-499d- 98db-a0919d28ec f7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 4.0 seconds): OSError: [Errno 113] EHOSTUNREACH 18384]: ERROR oslo.messaging. _drivers. impl_rabbit [None req-353318d1- f4bd-499d- 98db-a0919d28ec f7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 6.0 seconds): OSError: [Errno 113] EHOSTUNREACH 18384]: ERROR oslo.messaging. _drivers. impl_rabbit [None req-353318d1- f4bd-499d- 98db-a0919d28ec f7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 8.0 seconds): OSError: [Errno 113] EHOSTUNREACH 18384]: ERROR oslo.messaging. _drivers. impl_rabbit [None req-353318d1- f4bd-499d- 98db-a0919d28ec f7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 10.0 seconds): OSError: [Errno 113] EHOSTUNREACH
Mar 01 21:16:28 devstack nova-scheduler[
Mar 01 21:16:32 devstack nova-scheduler[
Mar 01 21:16:35 devstack nova-scheduler[
Mar 01 21:16:42 devstack nova-scheduler[
Mar 01 21:16:51 devstack nova-scheduler[
Mar 01 21:17:02 devstack nova-scheduler[
(...)
```
Because the notification RabbitMQ cluster is down, it fails to send:
https:/ /github. com/openstack/ nova/blob/ 5b66caab870558b 8a7f7b662c01587 b959ad3d41/ nova/scheduler/ filter_ scheduler. py#L85
because oslo messaging never gives up:
https:/ /github. com/openstack/ oslo.messaging/ blob/5aa645b38b 4c1cf08b00e687e b6c7c4b8a0211fc /oslo_messaging /_drivers/ impl_rabbit. py#L736