Comment 16 for bug 1707160

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

OK, I still see this failing on a neutron api request here: http://logs.openstack.org/01/489701/1/check/gate-grenade-dsvm-ironic-ubuntu-xenial/50785e5/logs/testr_results.html.gz

In neutron-server log, we see first NotFound ERROR message from the missing fanout exchange (that's not an issue per se, and I sent a patch to downgrade the message to DEBUG [1]); then the request handler attempts to fanout another event to agents; then request is locked for ~115 seconds to end with:

2017-08-01 22:50:59.696 1787 ERROR oslo.messaging._drivers.impl_rabbit [req-765ebd21-ee44-4db2-85f8-f6b7e52cdf19 tempest-RoutersTest-175057974 tempest-RoutersTest-175057974] [fdb10a5b-b3c8-436c-ba74-193c721329bb] AMQP server on 104.130.216.130:5672 is unreachable: timed out. Trying again in 1 seconds. Client port: None: timeout: timed out
2017-08-01 22:51:00.715 1787 INFO oslo.messaging._drivers.impl_rabbit [req-765ebd21-ee44-4db2-85f8-f6b7e52cdf19 tempest-RoutersTest-175057974 tempest-RoutersTest-175057974] [fdb10a5b-b3c8-436c-ba74-193c721329bb] Reconnected to AMQP server on 104.130.216.130:5672 via [amqp] client with port 51878.

When we look at rabbitmq logs around the time of failure we see:

=ERROR REPORT==== 1-Aug-2017::22:49:05 ===
Channel error on connection <0.3221.0> (104.130.216.130:43578 -> 104.130.216.130:5672, vhost: '/', user: 'stackrabbit'), channel 1:
{amqp_error,not_found,
            "no exchange 'q-agent-notifier-port-delete_fanout' in vhost '/'",
            'basic.publish'}

=WARNING REPORT==== 1-Aug-2017::22:50:59 ===
closing AMQP connection <0.3221.0> (104.130.216.130:43578 -> 104.130.216.130:5672):
connection_closed_abruptly

I suspect that after NotFound is raised, oslo.messaging channels get into a state that doesn't allow to send any more messages, and only connection reset restores connectivity.

[1] https://review.openstack.org/#/c/490139/