In neutron-server log, we see first NotFound ERROR message from the missing fanout exchange (that's not an issue per se, and I sent a patch to downgrade the message to DEBUG [1]); then the request handler attempts to fanout another event to agents; then request is locked for ~115 seconds to end with:
2017-08-01 22:50:59.696 1787 ERROR oslo.messaging._drivers.impl_rabbit [req-765ebd21-ee44-4db2-85f8-f6b7e52cdf19 tempest-RoutersTest-175057974 tempest-RoutersTest-175057974] [fdb10a5b-b3c8-436c-ba74-193c721329bb] AMQP server on 104.130.216.130:5672 is unreachable: timed out. Trying again in 1 seconds. Client port: None: timeout: timed out
2017-08-01 22:51:00.715 1787 INFO oslo.messaging._drivers.impl_rabbit [req-765ebd21-ee44-4db2-85f8-f6b7e52cdf19 tempest-RoutersTest-175057974 tempest-RoutersTest-175057974] [fdb10a5b-b3c8-436c-ba74-193c721329bb] Reconnected to AMQP server on 104.130.216.130:5672 via [amqp] client with port 51878.
When we look at rabbitmq logs around the time of failure we see:
=ERROR REPORT==== 1-Aug-2017::22:49:05 ===
Channel error on connection <0.3221.0> (104.130.216.130:43578 -> 104.130.216.130:5672, vhost: '/', user: 'stackrabbit'), channel 1:
{amqp_error,not_found,
"no exchange 'q-agent-notifier-port-delete_fanout' in vhost '/'", 'basic.publish'}
I suspect that after NotFound is raised, oslo.messaging channels get into a state that doesn't allow to send any more messages, and only connection reset restores connectivity.
OK, I still see this failing on a neutron api request here: http:// logs.openstack. org/01/ 489701/ 1/check/ gate-grenade- dsvm-ironic- ubuntu- xenial/ 50785e5/ logs/testr_ results. html.gz
In neutron-server log, we see first NotFound ERROR message from the missing fanout exchange (that's not an issue per se, and I sent a patch to downgrade the message to DEBUG [1]); then the request handler attempts to fanout another event to agents; then request is locked for ~115 seconds to end with:
2017-08-01 22:50:59.696 1787 ERROR oslo.messaging. _drivers. impl_rabbit [req-765ebd21- ee44-4db2- 85f8-f6b7e52cdf 19 tempest- RoutersTest- 175057974 tempest- RoutersTest- 175057974] [fdb10a5b- b3c8-436c- ba74-193c721329 bb] AMQP server on 104.130. 216.130: 5672 is unreachable: timed out. Trying again in 1 seconds. Client port: None: timeout: timed out _drivers. impl_rabbit [req-765ebd21- ee44-4db2- 85f8-f6b7e52cdf 19 tempest- RoutersTest- 175057974 tempest- RoutersTest- 175057974] [fdb10a5b- b3c8-436c- ba74-193c721329 bb] Reconnected to AMQP server on 104.130. 216.130: 5672 via [amqp] client with port 51878.
2017-08-01 22:51:00.715 1787 INFO oslo.messaging.
When we look at rabbitmq logs around the time of failure we see:
=ERROR REPORT==== 1-Aug-2017: :22:49: 05 === 216.130: 43578 -> 104.130. 216.130: 5672, vhost: '/', user: 'stackrabbit'), channel 1: not_found, notifier- port-delete_ fanout' in vhost '/'",
'basic. publish' }
Channel error on connection <0.3221.0> (104.130.
{amqp_error,
"no exchange 'q-agent-
=WARNING REPORT==== 1-Aug-2017: :22:50: 59 === 216.130: 43578 -> 104.130. 216.130: 5672): closed_ abruptly
closing AMQP connection <0.3221.0> (104.130.
connection_
I suspect that after NotFound is raised, oslo.messaging channels get into a state that doesn't allow to send any more messages, and only connection reset restores connectivity.
[1] https:/ /review. openstack. org/#/c/ 490139/