Unable to update the attachment.: MessagingTimeout

Bug #1763712 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Confirmed
Medium
Unassigned

Bug Description

Seeing this in the gate:

http://logs.openstack.org/30/560930/1/gate/nova-next/dd44138/logs/screen-c-api.txt.gz?level=TRACE#_Apr_13_00_50_45_642761

Apr 13 00:50:45.642761 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments [req-a4a450bd-abb6-4812-a70b-b49edad3e469 req-1ae13a6c-f325-4a29-a0e3-b0a193b71869 tempest-ServerActionsTestJSON-1769432331 tempest-ServerActionsTestJSON-1769432331] Unable to update the attachment.: MessagingTimeout: Timed out waiting for a reply to message ID 90fe8ccba0a044f7a3a9770048c6d1b2
Apr 13 00:50:45.643045 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments Traceback (most recent call last):
Apr 13 00:50:45.643251 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/opt/stack/new/cinder/cinder/api/v3/attachments.py", line 228, in update
Apr 13 00:50:45.643460 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments connector))
Apr 13 00:50:45.643673 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/opt/stack/new/cinder/cinder/volume/api.py", line 2162, in attachment_update
Apr 13 00:50:45.643876 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments attachment_ref.id))
Apr 13 00:50:45.644079 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/opt/stack/new/cinder/cinder/rpc.py", line 187, in _wrapper
Apr 13 00:50:45.644280 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments return f(self, *args, **kwargs)
Apr 13 00:50:45.644481 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/opt/stack/new/cinder/cinder/volume/rpcapi.py", line 442, in attachment_update
Apr 13 00:50:45.644694 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments attachment_id=attachment_id)
Apr 13 00:50:45.644897 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
Apr 13 00:50:45.645108 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments retry=self.retry)
Apr 13 00:50:45.645341 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in _send
Apr 13 00:50:45.645596 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments timeout=timeout, retry=retry)
Apr 13 00:50:45.645820 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
Apr 13 00:50:45.646043 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments retry=retry)
Apr 13 00:50:45.646260 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 548, in _send
Apr 13 00:50:45.646487 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments result = self._waiter.wait(msg_id, timeout)
Apr 13 00:50:45.646704 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 440, in wait
Apr 13 00:50:45.646929 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments message = self.waiters.get(msg_id, timeout=timeout)
Apr 13 00:50:45.647139 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 328, in get
Apr 13 00:50:45.647351 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments 'to message ID %s' % msg_id)
Apr 13 00:50:45.647658 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments MessagingTimeout: Timed out waiting for a reply to message ID 90fe8ccba0a044f7a3a9770048c6d1b2
Apr 13 00:50:45.647884 ubuntu-xenial-inap-mtl01-0003494782 <email address hidden>[826]: ERROR cinder.api.v3.attachments

It's intermittent.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Unable%20to%20update%20the%20attachment.%3A%20MessagingTimeout%5C%22%20AND%20tags%3A%5C%22screen-c-api.txt%5C%22&from=7d

20 hits in 7 days, check and gate, master and stable/queens.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Hmm, not seeing corresponding logs on the c-vol side. I wonder if we are hitting rabbitmq issues?

Revision history for this message
Matt Riedemann (mriedem) wrote :

Similar to:

https://bugs.launchpad.net/cinder/+bug/1739482/comments/2

It might be worth using the long_rpc_timeout approach for this assuming the http response doesn't timeout.

I do see this fail mostly in the nova-multiattach job, but in this case it's not a multiattach test:

http://logs.openstack.org/81/624181/1/check/nova-multiattach/c1589ae/job-output.txt.gz#_2018-12-10_22_03_34_203163

2018-12-10 22:03:34.203163 | primary | {1} tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_resize_server_revert_with_volume_attached [99.666432s] ... FAILED

So I'm not sure how that would be related.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.3 KiB)

Looking at a recent failure:

http://logs.openstack.org/81/624181/1/check/nova-multiattach/c1589ae/logs/screen-c-api.txt#_Dec_10_22_03_25_888783

The cinder-api request starts here:

Dec 10 22:02:25.787298 ubuntu-xenial-inap-mtl01-0001107482 <email address hidden>[30020]: INFO cinder.api.openstack.wsgi [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 tempest-ServerActionsTestJSON-1162184781] PUT http://198.72.124.211/volume/v3/54189e280b844b928184fb49836be237/attachments/e8c408fd-969b-4dac-93b6-8031f6634fc8

and times out here:

Dec 10 22:03:25.888783 ubuntu-xenial-inap-mtl01-0001107482 <email address hidden>[30020]: ERROR cinder.api.v3.attachments [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 tempest-ServerActionsTestJSON-1162184781] Unable to update the attachment.: MessagingTimeout: Timed out waiting for a reply to message ID 60d2d0479c7749adb7f116a8d6de570c

The request ID is req-fec531fa-7200-4e15-a820-18ea7e5e6214. Tracing that down into cinder-volume logs I see this as the first thing logged:

Dec 10 22:02:25.906808 ubuntu-xenial-inap-mtl01-0001107482 cinder-volume[31110]: DEBUG oslo_concurrency.processutils [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 None] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-b62c0c8d-b3cd-46d0-b009-cabc13115f07 {{(pid=31207) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:372}}

The next entry is:

Dec 10 22:03:24.340740 ubuntu-xenial-inap-mtl01-0001107482 cinder-volume[31110]: DEBUG oslo_concurrency.processutils [req-2627d823-75bc-4092-bb8e-f079761d732f req-fec531fa-7200-4e15-a820-18ea7e5e6214 tempest-ServerActionsTestJSON-1162184781 None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-b62c0c8d-b3cd-46d0-b009-cabc13115f07" returned: 0 in 58.435s {{(pid=31207) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}

This is 1 second before the timeout. So "lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-b62c0c8d-b3cd-46d0-b009-cabc13115f07" is taking 58.435s and the rpc_response_timeout is 60 seconds by default.

Looking at syslog, I noticed that a few seconds before that lvchange call, another is running on another volume which takes 61+ seconds:

Dec 10 22:03:24.317836 ubuntu-xenial-inap-mtl01-0001107482 cinder-volume[31110]: DEBUG oslo_concurrency.processutils [req-9e16b85d-5662-4c70-aa13-88cd9c6a03fc req-ba539360-fea6-4538-88dd-afa948120e1c tempest-ServerRescueNegativeTestJSON-1204517625 None] CMD "sudo cinder-rootwrap /etc/cinder/rootwrap.conf lvchange -a y --yes -K stack-volumes-lvmdriver-1/volume-c48c90f2-bf42-473b-b9a4-48f7e6f97f24" returned: 0 in 61.526s {{(pid=31207) execute /usr/local/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:409}}

Looking at the activate_lv code in cinder it doesn't appear to be doing any locking or anything from the cind...

Read more...

Revision history for this message
Matt Riedemann (mriedem) wrote :

Related change: https://review.openstack.org/624809

Dan Smith pointed out in https://linux.die.net/man/5/lvm.conf that by default (locking_type/wait_for_locks) it will lock for concurrent operations to avoid corrupting data.

(2:59:46 PM) dansmith: 1. access to dm is one at a time, IIRC, which means they'll fight to setup dm devices to some degree, and
(3:00:09 PM) dansmith: 2. If they're doing anything that causes buffer cache to flush, two running at once on an io-limited system will slow things down
(3:00:24 PM) dansmith: IIRC, there is a long history of issues with lvm concurrency

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Sean McGinnis (<email address hidden>) on branch: master
Review: https://review.openstack.org/624809
Reason: Quick recheck

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Sean McGinnis (<email address hidden>) on branch: master
Review: https://review.openstack.org/624809
Reason: Must have done it too quickly.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/624809
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=206f980cc2f09610ffb86cfc7728222fd39eee4f
Submitter: Zuul
Branch: master

commit 206f980cc2f09610ffb86cfc7728222fd39eee4f
Author: Dan Smith <email address hidden>
Date: Wed Dec 12 12:57:04 2018 -0800

    Synchronize all LVM operations

    This makes us synchronize/serialize all of our external calls to LVM
    operations in an attempt to avoid contention of LVM's own locking
    scheme, as well as avoid potentially overlapping operations that may
    be flushing large amounts of data to disk.

    Related-Bug: #1739482
    Related-Bug: #1763712
    Change-Id: Idb4053685486f2163e7cb135cd579eba5d78f369

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.