Grenade Failure - MessagingTimeout on floating IP remove

Bug #1545002 reported by Davanum Srinivas (DIMS)
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Undecided
Unassigned
oslo.messaging
Fix Released
High
Davanum Srinivas (DIMS)

Bug Description

Logstash query:
message:"<class 'oslo_messaging.exceptions.MessagingTimeout'> (HTTP 500)"

Grenade failure:
2016-02-12 10:31:02.846 | + openstack ip floating remove 172.24.5.2 cinder_server1
2016-02-12 10:32:05.422 | Unexpected API Error. Please report this at http://bugs.launchpad.net/nova/ and attach the Nova API log if possible.
2016-02-12 10:32:05.422 | <class 'oslo_messaging.exceptions.MessagingTimeout'> (HTTP 500) (Request-ID: req-b1f17c4c-b814-4a21-9209-13db7915370f)
2016-02-12 10:32:05.443 | + exit_trap

Example from: http://logs.openstack.org/12/278012/3/check/gate-grenade-dsvm/ac87c30/logs/grenade.sh.txt.gz

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Changed in oslo.messaging:
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (4.3 KiB)

The failure in n-api is here:

http://logs.openstack.org/12/278012/3/check/gate-grenade-dsvm/ac87c30/logs/new/screen-n-api.txt.gz?level=TRACE#_2016-02-12_10_32_05_378

That timeout is at 10:32:05. Around the same time in the n-net logs we have this:

http://logs.openstack.org/12/278012/3/check/gate-grenade-dsvm/ac87c30/logs/new/screen-n-net.txt.gz#_2016-02-12_10_31_05_380

2016-02-12 10:31:05.380 DEBUG oslo_concurrency.lockutils [req-b1f17c4c-b814-4a21-9209-13db7915370f cinder_grenade cinder_grenade] Lock "172.24.5.2" acquired by "nova.network.floating_ips.do_disassociate" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:270
2016-02-12 10:32:06.595 22979 DEBUG oslo.messaging._drivers.impl_rabbit [-] Received recoverable error from kombu: on_error /usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py:654
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 436, in _ensured
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, **kwargs)
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 508, in __call__
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit return fun(*args, channel=channels[0], **kwargs), channels[0]
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 704, in execute_method
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit method()
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/impl_rabbit.py", line 1050, in _publish
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit producer.publish(msg, expiration=timeout)
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 172, in publish
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit routing_key, mandatory, immediate, exchange, declare)
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/messaging.py", line 188, in _publish
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit mandatory=mandatory, immediate=immediate,
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/amqp/channel.py", line 2132, in basic_publish_confirm
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit self.wait([(60, 80), (60, 120)])
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 67, in wait
2016-02-12 10:32:06.595 22979 ERROR oslo.messaging._d...

Read more...

Changed in oslo.messaging:
importance: Critical → High
Revision history for this message
Matt Riedemann (mriedem) wrote :
summary: - Grenade Failure - ip floating remove
+ Grenade Failure - MessagingTimeout on floating IP remove
Revision history for this message
Matt Riedemann (mriedem) wrote :

oslo.messaging 4.1.0 was released on 2/4 so we suspect there was a regression in that:

https://github.com/openstack/oslo.messaging/compare/4.0.0...4.1.0

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

1 hit in last 7 days. Please reopen if this repeats again.

Changed in oslo.messaging:
status: Confirmed → Incomplete
Sean Dague (sdague)
Changed in oslo.messaging:
status: Incomplete → New
Revision history for this message
Dan Smith (danms) wrote :
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Looks like 5 hits starting 7th. all on build_node devstack-trusty-osic-cloud1-XYZ

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote :

Adding "gate-failure" tag as this issue got noted in the gate.

tags: added: gate-failure
Revision history for this message
Sean Dague (sdague) wrote :

This is not a nova issue, this is an oslo.messaging failure. The Nova API is just the first thing to get called.

Changed in nova:
status: New → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)

Fix proposed to branch: master
Review: https://review.openstack.org/292181

Changed in oslo.messaging:
assignee: nobody → Davanum Srinivas (DIMS) (dims-v)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/292313

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.openstack.org/292181
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=b6a8d51e6cdbe69b38a0c51401e7ffd07d67a9ba
Submitter: Jenkins
Branch: master

commit b6a8d51e6cdbe69b38a0c51401e7ffd07d67a9ba
Author: Davanum Srinivas <email address hidden>
Date: Sun Mar 13 20:41:01 2016 -0400

    Bump rabbit_transient_queues_ttl to 30 mins

    In I83a8d09dc0cdae24c12d7043ec810529a9ce57ab, we
    made reply and fanout queues expire instead of auto-delete
    and set the rabbit_transient_queues_ttl to 10 mins.

    In grenade, we seem to see the queues expiring just around
    the time the new side is coming up which causes havoc. There
    is no reason the rabbit_transient_queues_ttl should not be
    higher. So let's bump up the expiry to 30 mins.

    Closes-Bug: #1545002
    Change-Id: I70a29b762198129fe0a3e904d9f2a7d4242b322c

Changed in oslo.messaging:
status: In Progress → Fix Released
tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/mitaka)

Reviewed: https://review.openstack.org/292313
Committed: https://git.openstack.org/cgit/openstack/oslo.messaging/commit/?id=f8cad0e76d1d7ed0c60b0a7c2cd6c157382f0e9a
Submitter: Jenkins
Branch: stable/mitaka

commit f8cad0e76d1d7ed0c60b0a7c2cd6c157382f0e9a
Author: Davanum Srinivas <email address hidden>
Date: Sun Mar 13 20:41:01 2016 -0400

    Bump rabbit_transient_queues_ttl to 30 mins

    In I83a8d09dc0cdae24c12d7043ec810529a9ce57ab, we
    made reply and fanout queues expire instead of auto-delete
    and set the rabbit_transient_queues_ttl to 10 mins.

    In grenade, we seem to see the queues expiring just around
    the time the new side is coming up which causes havoc. There
    is no reason the rabbit_transient_queues_ttl should not be
    higher. So let's bump up the expiry to 30 mins.

    Closes-Bug: #1545002
    Change-Id: I70a29b762198129fe0a3e904d9f2a7d4242b322c

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote : Fix included in openstack/oslo.messaging 4.5.1

This issue was fixed in the openstack/oslo.messaging 4.5.1 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/oslo.messaging 5.0.0

This issue was fixed in the openstack/oslo.messaging 5.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.