test_snapshot_backup fails to build backup due to MessagingTimeout

Bug #1739482 reported by Matt Riedemann on 2017-12-20
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Medium
Unassigned

Bug Description

http://logs.openstack.org/46/527446/4/check/legacy-tempest-dsvm-neutron-full/a1d1824/job-output.txt.gz#_2017-12-20_19_41_04_137054

We see the MessagingTimeout in the c-bak logs here:

http://logs.openstack.org/46/527446/4/check/legacy-tempest-dsvm-neutron-full/a1d1824/logs/screen-c-bak.txt.gz#_Dec_20_19_40_07_078990

Dec 20 19:40:07.078990 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server [None req-a42446df-6ece-4fdb-bc66-0efd10bd87eb tempest-VolumesBackupsTest-1597191743 None] Exception during message handling: MessagingTimeout: Timed out waiting for a reply to message ID eed5916a0e21428eb823c3b80d43411f
Dec 20 19:40:07.079159 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Dec 20 19:40:07.079291 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
Dec 20 19:40:07.079421 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Dec 20 19:40:07.079547 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
Dec 20 19:40:07.079671 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Dec 20 19:40:07.079794 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
Dec 20 19:40:07.079922 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Dec 20 19:40:07.080052 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/backup/manager.py", line 380, in create_backup
Dec 20 19:40:07.080175 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server self._update_backup_error(backup, six.text_type(err))
Dec 20 19:40:07.080296 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Dec 20 19:40:07.080419 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server self.force_reraise()
Dec 20 19:40:07.080540 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Dec 20 19:40:07.080662 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Dec 20 19:40:07.080784 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/backup/manager.py", line 369, in create_backup
Dec 20 19:40:07.080910 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server updates = self._run_backup(context, backup, volume)
Dec 20 19:40:07.081045 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/backup/manager.py", line 418, in _run_backup
Dec 20 19:40:07.081167 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server backup_device.is_snapshot)
Dec 20 19:40:07.081293 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/backup/manager.py", line 922, in _attach_device
Dec 20 19:40:07.081418 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server return self._attach_volume(ctxt, backup_device, properties)
Dec 20 19:40:07.081564 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/backup/manager.py", line 944, in _attach_volume
Dec 20 19:40:07.081687 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server {'volume_id', volume.id})
Dec 20 19:40:07.081801 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Dec 20 19:40:07.081928 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server self.force_reraise()
Dec 20 19:40:07.082048 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Dec 20 19:40:07.082165 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Dec 20 19:40:07.082282 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/backup/manager.py", line 932, in _attach_volume
Dec 20 19:40:07.082415 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server properties)
Dec 20 19:40:07.082539 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/opt/stack/new/cinder/cinder/volume/rpcapi.py", line 226, in initialize_connection
Dec 20 19:40:07.082669 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server volume=volume)
Dec 20 19:40:07.082792 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
Dec 20 19:40:07.082918 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server retry=self.retry)
Dec 20 19:40:07.083043 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 131, in _send
Dec 20 19:40:07.083166 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server timeout=timeout, retry=retry)
Dec 20 19:40:07.083299 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
Dec 20 19:40:07.083423 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server retry=retry)
Dec 20 19:40:07.083547 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 548, in _send
Dec 20 19:40:07.083926 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server result = self._waiter.wait(msg_id, timeout)
Dec 20 19:40:07.084050 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 440, in wait
Dec 20 19:40:07.084177 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server message = self.waiters.get(msg_id, timeout=timeout)
Dec 20 19:40:07.084300 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 328, in get
Dec 20 19:40:07.084473 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server 'to message ID %s' % msg_id)
Dec 20 19:40:07.084629 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server MessagingTimeout: Timed out waiting for a reply to message ID eed5916a0e21428eb823c3b80d43411f
Dec 20 19:40:07.084758 ubuntu-xenial-rax-ord-0001541209 cinder-backup[17817]: ERROR oslo_messaging.rpc.server

Around the same time, I see what appears to be another snapshot, maybe backup, related test failing in the c-vol logs, looks like it's also trying to create an export:

http://logs.openstack.org/46/527446/4/check/legacy-tempest-dsvm-neutron-full/a1d1824/logs/screen-c-vol.txt.gz#_Dec_20_19_40_21_346821

Dec 20 19:40:21.346821 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: ERROR cinder.volume.targets.tgt [None req-1e3657e2-1701-4998-b498-40ec746b8412 tempest-VolumesSnapshotTestJSON-1495982865 None] Failed recovery attempt to create iscsi backing lun for Volume ID:iqn.2010-10.org.openstack:volume-beb14f75-ac91-4b8d-a56b-3cb5565d3d5a: Unexpected error while running command.
Dec 20 19:40:21.346982 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/stack-volumes-lvmdriver-1/volume-beb14f75-ac91-4b8d-a56b-3cb5565d3d5a
Dec 20 19:40:21.347141 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: Exit code: 22
Dec 20 19:40:21.347303 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: Stdout: u''
Dec 20 19:40:21.347460 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: Stderr: u'tgtadm: invalid request\n': ProcessExecutionError: Unexpected error while running command.
Dec 20 19:40:21.347620 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: Command: sudo cinder-rootwrap /etc/cinder/rootwrap.conf tgtadm --lld iscsi --op new --mode logicalunit --tid 1 --lun 1 -b /dev/stack-volumes-lvmdriver-1/volume-beb14f75-ac91-4b8d-a56b-3cb5565d3d5a
Dec 20 19:40:21.347771 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: Exit code: 22
Dec 20 19:40:21.348017 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: Stdout: u''
Dec 20 19:40:21.348169 ubuntu-xenial-rax-ord-0001541209 cinder-volume[18152]: Stderr: u'tgtadm: invalid request\n'

http://logs.openstack.org/46/527446/4/check/legacy-tempest-dsvm-neutron-full/a1d1824/logs/screen-c-vol.txt.gz#_Dec_20_19_40_21_660736

I wonder if the competing requests to create an export is making it blow up and that's why there is no response to the c-bak service which causes the messaging timeout?

Matt Riedemann (mriedem) wrote :

In this case, we might want to consider leveraging the new heartbeat capability in oslo.messaging. See how nova is using that here for potentially long RPC calls during live migration:

https://review.openstack.org/#/c/566696/

The idea is for internal interactions between services that might otherwise timeout the global 60 seconds RPC call, we have a new long_rpc_timeout option that is used for certain operations that are known to potentially take longer than 60 seconds, like in this backup scenario.

If the target of the rpc call dies and fails the heartbeat, then you still get a MessagingTimeout error, but if the target is still up and responding to pings we let it go until we hit a hard long_rpc_timeout.

Changed in cinder:
status: New → Confirmed
importance: Undecided → Medium
Matt Riedemann (mriedem) wrote :

This could be related to the root issue in bug 1763712 where lvchange is taking nearly 60 seconds to complete. cinder-backup calls cinder-volume initialize_connection using an RPC call so it is a very similar failure.

Change abandoned by Sean McGinnis (<email address hidden>) on branch: master
Review: https://review.openstack.org/624809
Reason: Quick recheck

Change abandoned by Sean McGinnis (<email address hidden>) on branch: master
Review: https://review.openstack.org/624809
Reason: Must have done it too quickly.

Reviewed: https://review.openstack.org/624809
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=206f980cc2f09610ffb86cfc7728222fd39eee4f
Submitter: Zuul
Branch: master

commit 206f980cc2f09610ffb86cfc7728222fd39eee4f
Author: Dan Smith <email address hidden>
Date: Wed Dec 12 12:57:04 2018 -0800

    Synchronize all LVM operations

    This makes us synchronize/serialize all of our external calls to LVM
    operations in an attempt to avoid contention of LVM's own locking
    scheme, as well as avoid potentially overlapping operations that may
    be flushing large amounts of data to disk.

    Related-Bug: #1739482
    Related-Bug: #1763712
    Change-Id: Idb4053685486f2163e7cb135cd579eba5d78f369

tags: added: test-coverage
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers