Activity log for bug #1917645

Date Who What changed Old value New value Message
2021-03-03 16:21:14 Belmiro Moreira bug added bug
2021-03-03 16:29:26 Belmiro Moreira description We use independent RabbitMQ clusters for each OpenStack project, Nova Cells and also for notifications. Recently, I noticed in our test infrastructure that if the RabbitMQ cluster for notifications has an outage, Nova can't create new instances. Possibly other operations will also hang. Not being able to send a notification/connect to the RabbitMQ cluster shouldn't stop new instances to be created. (If this is actually an use-case for some deployments, the operator should have the possibility to configure it.) Tested against the master branch. If the notification RabbitMQ is stooped, when creating an instance, nova-scheduler is stuck with: ``` Mar 01 21:16:28 devstack nova-scheduler[18384]: DEBUG nova.scheduler.request_filter [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Request filter 'accelerators_filter' took 0.0 seconds {{(pid=18384) wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}} Mar 01 21:16:32 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:35 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 4.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:42 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 6.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:51 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 8.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:17:02 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 10.0 seconds): OSError: [Errno 113] EHOSTUNREACH (...) ``` Because the notification RabbitMQ cluster is down, it fails to send: https://github.com/openstack/nova/blob/5b66caab870558b8a7f7b662c01587b959ad3d41/nova/scheduler/filter_scheduler.py#L85 because oslo messaging never gives up: https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_drivers/impl_rabbit.py#L736 We use independent RabbitMQ clusters for each OpenStack project, Nova Cells and also for notifications. Recently, I noticed in our test infrastructure that if the RabbitMQ cluster for notifications has an outage, Nova can't create new instances. Possibly other operations will also hang. Not being able to send a notification/connect to the RabbitMQ cluster shouldn't stop new instances to be created. (If this is actually an use-case for some deployments, the operator should have the possibility to configure it.) Tested against the master branch. If the notification RabbitMQ is stooped, when creating an instance, nova-scheduler is stuck with: ``` Mar 01 21:16:28 devstack nova-scheduler[18384]: DEBUG nova.scheduler.request_filter [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Request filter 'accelerators_filter' took 0.0 seconds {{(pid=18384) wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}} Mar 01 21:16:32 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:35 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 4.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:42 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 6.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:16:51 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 8.0 seconds): OSError: [Errno 113] EHOSTUNREACH Mar 01 21:17:02 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 10.0 seconds): OSError: [Errno 113] EHOSTUNREACH (...) ``` Because the notification RabbitMQ cluster is down, Nova gets stuck in: https://github.com/openstack/nova/blob/5b66caab870558b8a7f7b662c01587b959ad3d41/nova/scheduler/filter_scheduler.py#L85 because oslo messaging never gives up: https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_drivers/impl_rabbit.py#L736
2021-03-04 09:11:43 Balazs Gibizer nova: assignee Balazs Gibizer (balazs-gibizer)
2021-03-04 09:11:47 Balazs Gibizer nova: status New Fix Committed
2021-03-04 09:11:52 Balazs Gibizer nova: status Fix Committed Confirmed
2021-03-04 09:11:58 Balazs Gibizer nova: importance Undecided Medium
2021-03-04 09:12:05 Balazs Gibizer tags notifications
2021-11-23 14:57:46 Mohammed Naser bug task added oslo.messaging
2021-11-24 16:17:36 Balazs Gibizer oslo.messaging: assignee Balazs Gibizer (balazs-gibizer)
2021-11-24 16:20:18 OpenStack Infra oslo.messaging: status New In Progress
2022-01-12 15:29:44 OpenStack Infra oslo.messaging: status In Progress Fix Released
2022-02-09 17:33:08 OpenStack Infra tags notifications in-stable-xena notifications
2022-04-05 11:24:03 OpenStack Infra tags in-stable-xena notifications in-stable-wallaby in-stable-xena notifications
2022-04-27 17:11:12 Balazs Gibizer nova: status Confirmed Invalid