[nova] access to /nova vhost not allowed during limited rollout

Bug #1825329 reported by Christian Zunker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
Mohammed Naser
Rocky
Fix Committed
Undecided
Christian Zunker
Stein
Fix Released
Medium
Mohammed Naser
Train
Fix Released
Medium
Mohammed Naser

Bug Description

OSA Version 18.1.0

We have a three controller setup with APIs running on each controller and a RabbitMQ cluster spanning all three controllers.
I wanted to roll out a config change to nova and limited it to only one of the three controllers to reduce downtime.
But the roll out affected all the controllers. In the nova-scheduler logs I got error messages like this one:
2019-04-17 17:03:13.080 159781 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: Connection.open: (530) NOT_ALLOWED - access to vhost '/nova' refused for user 'nova': NotAllowed: Connection.open: (530) NOT_ALLOWED - access to vhost '/nova' refused for user 'nova'
2019-04-17 17:03:13.080 159781 ERROR oslo.messaging._drivers.impl_rabbit [-] Unable to connect to AMQP server on 172.33.253.218:5671 after None tries: Connection.open: (530) NOT_ALLOWED - access to vhost '/nova' refused for user 'nova': NotAllowed: Connection.open: (530) NOT_ALLOWED - access to vhost '/nova' refused for user 'nova'
2019-04-17 17:03:13.081 159781 ERROR root [-] Unexpected exception occurred 2 time(s)... retrying.: MessageDeliveryFailure: Unable to connect to AMQP server on 172.33.253.218:5671 after None tries: Connection.open: (530) NOT_ALLOWED - access to vhost '/nova' refused for user 'nova'
2019-04-17 17:03:13.081 159781 ERROR root Traceback (most recent call last):
2019-04-17 17:03:13.081 159781 ERROR root File "/openstack/venvs/nova-18.1.0/lib/python2.7/site-packages/oslo_utils/excutils.py", line 250, in wrapper
2019-04-17 17:03:13.081 159781 ERROR root return infunc(*args, **kwargs)
2019-04-17 17:03:13.081 159781 ERROR root File "/openstack/venvs/nova-18.1.0/lib/python2.7/site-packages/oslo_messaging/_drivers/base.py", line 304, in _runner
2019-04-17 17:03:13.081 159781 ERROR root batch_size=self.batch_size, batch_timeout=self.batch_timeout)
2019-04-17 17:03:13.081 159781 ERROR root File "/openstack/venvs/nova-18.1.0/lib/python2.7/site-packages/oslo_messaging/_drivers/base.py", line 53, in wrapper
2019-04-17 17:03:13.081 159781 ERROR root message = func(in_self, timeout=watch.leftover(True))
2019-04-17 17:03:13.081 159781 ERROR root File "/openstack/venvs/nova-18.1.0/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py", line 287, in poll
2019-04-17 17:03:13.081 159781 ERROR root self.conn.consume(timeout=min(self._current_timeout, left))
2019-04-17 17:03:13.081 159781 ERROR root File "/openstack/venvs/nova-18.1.0/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1069, in consume
2019-04-17 17:03:13.081 159781 ERROR root error_callback=_error_callback)
2019-04-17 17:03:13.081 159781 ERROR root File "/openstack/venvs/nova-18.1.0/lib/python2.7/site-packages/oslo_messaging/_drivers/impl_rabbit.py", line 811, in ensure
2019-04-17 17:03:13.081 159781 ERROR root raise exceptions.MessageDeliveryFailure(msg)
2019-04-17 17:03:13.081 159781 ERROR root MessageDeliveryFailure: Unable to connect to AMQP server on 172.33.253.218:5671 after None tries: Connection.open: (530) NOT_ALLOWED - access to vhost '/nova' refused for user 'nova'
2019-04-17 17:03:13.081 159781 ERROR root
2019-04-17 17:03:13.201 159778 ERROR oslo.messaging._drivers.impl_rabbit [-] [e9252f35-60d2-4886-bd3d-86f77716178c] AMQP server on 172.33.253.218:5671 is unreachable: (0, 0): (320) CONNECTION_FORCED - user 'nova' is deleted. Trying again in 1 seconds.: ConnectionForced: (0, 0): (320) CONNECTION_FORCED - user 'nova' is deleted

The above error message is from the logs where I changed nova. The other two controllers show the same error message or a shorter version:
2019-04-17 17:03:13.971 133396 ERROR oslo.messaging._drivers.impl_rabbit [-] [422362d8-061c-482a-bba4-f13ba094dd8f] AMQP server on 172.33.238.196:5671 is unreachable: (0, 0): (320) CONNECTION_FORCED - user 'nova' is deleted. Trying again in 1 seconds.: ConnectionForced: (0, 0): (320) CONNECTION_FORCED - user 'nova' is deleted
2019-04-17 17:03:14.006 133389 ERROR oslo.messaging._drivers.impl_rabbit [-] [e76f5060-9113-47dd-bce0-f55a51b09d07] AMQP server on 172.33.253.218:5671 is unreachable: <AMQPError: unknown error>. Trying again in 1 seconds.: RecoverableConnectionError: <AMQPError: unknown error>
2019-04-17 17:03:14.006 133395 ERROR oslo.messaging._drivers.impl_rabbit [-] [f40dfd00-910d-438c-afc3-b8976dc3fd94] AMQP server on 172.33.238.196:5671 is unreachable: (0, 0): (320) CONNECTION_FORCED - user 'nova' is deleted. Trying again in 1 seconds.: ConnectionForced: (0, 0): (320) CONNECTION_FORCED - user 'nova' is deleted
2019-04-17 17:03:14.283 133393 INFO oslo.messaging._drivers.impl_rabbit [-] [a9ca1013-61d7-4e3e-a997-5448b146b439] Reconnected to AMQP server on 172.33.253.218:5671 via [amqp] client with port 43504.
2019-04-17 17:03:14.307 133391 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...: ConnectionForced: (0, 0): (320) CONNECTION_FORCED - user 'nova' is deleted

This is reproducable. I started the playbook for each controller and had the same problem during each roll out:
sudo openstack-ansible os-nova-install.yml -l "ctr1*"
sudo openstack-ansible os-nova-install.yml -l "ctr2*"
sudo openstack-ansible os-nova-install.yml -l "ctr3*"

Revision history for this message
Mohammed Naser (mnaser) wrote :

This is already fixed here:

https://review.opendev.org/#/c/648887/

Revision history for this message
Mohammed Naser (mnaser) wrote :

Someone needs to back port to rocky tho

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_nova (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/656443

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on openstack-ansible-os_nova (stable/rocky)

Change abandoned by Christian Zunker (<email address hidden>) on branch: stable/rocky
Review: https://review.opendev.org/656443
Reason: See comment by noonedeadpunk

Revision history for this message
Christian Zunker (christian-zunker) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-tests (stable/rocky)

Reviewed: https://review.opendev.org/656476
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-tests/commit/?id=461fada03f2b5b3d6af23842db029e6f3c520385
Submitter: Zuul
Branch: stable/rocky

commit 461fada03f2b5b3d6af23842db029e6f3c520385
Author: Christian Zunker <email address hidden>
Date: Fri May 17 13:49:18 2019 +0200

    Stop deleting-and-creating RabbitMQ account on deploy

    The `force` option creates and recreates the user which means on
    any deploy it kills *all* AMQP traffic on all agents.

    This updates it when changed every single run, which shouldn't
    kill the connectivity when we run playbooks.

    Change-Id: I1878f702022872190952604caff640bfe6c2ccc1
    Closes-Bug: 1825329

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-tests (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/664406

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-tests (master)

Fix proposed to branch: master
Review: https://review.opendev.org/664868

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-tests (master)

Reviewed: https://review.opendev.org/664868
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-tests/commit/?id=5879ae1891049f6464b8c2a5f5990b36eed62954
Submitter: Zuul
Branch: master

commit 5879ae1891049f6464b8c2a5f5990b36eed62954
Author: Dmitriy Rabotjagov <email address hidden>
Date: Wed Jun 12 15:21:13 2019 +0300

    Stop deleting-and-creating RabbitMQ account

    The `force` option creates and recreates the user which means on
    any deploy it kills *all* AMQP traffic on all agents.

    This updates it when changed every single run, which shouldn't
    kill the connectivity when we run playbooks.

    Change-Id: I5c2478b41d49dd3e4392c1800ad6a7e7c1494152
    Closes-Bug: 1825329

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-tests (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/666230

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-tests (stable/rocky)

Reviewed: https://review.opendev.org/664406
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-tests/commit/?id=c53c8de63f6e9c8e94c5dc58a16d79bd03da9cae
Submitter: Zuul
Branch: stable/rocky

commit c53c8de63f6e9c8e94c5dc58a16d79bd03da9cae
Author: Dmitriy Rabotjagov <email address hidden>
Date: Mon Jun 10 20:28:29 2019 +0300

    Stop deleting-and-creating RabbitMQ account

    The `force` option creates and recreates the user which means on
    any deploy it kills *all* AMQP traffic on all agents.

    This updates it when changed every single run, which shouldn't
    kill the connectivity when we run playbooks.

    Change-Id: I5c2478b41d49dd3e4392c1800ad6a7e7c1494152
    Closes-Bug: 1825329

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-tests (stable/stein)

Reviewed: https://review.opendev.org/666230
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-tests/commit/?id=dd7d3681a505a53cf55d54a678fe9881d225a8f1
Submitter: Zuul
Branch: stable/stein

commit dd7d3681a505a53cf55d54a678fe9881d225a8f1
Author: Dmitriy Rabotjagov <email address hidden>
Date: Wed Jun 12 15:21:13 2019 +0300

    Stop deleting-and-creating RabbitMQ account

    The `force` option creates and recreates the user which means on
    any deploy it kills *all* AMQP traffic on all agents.

    This updates it when changed every single run, which shouldn't
    kill the connectivity when we run playbooks.

    Change-Id: I5c2478b41d49dd3e4392c1800ad6a7e7c1494152
    Closes-Bug: 1825329
    (cherry picked from commit 5879ae1891049f6464b8c2a5f5990b36eed62954)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.