test_encrypted_cinder_volumes_luks fails to detach encrypted volume

Bug #1374458 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Matt Riedemann
Icehouse
Fix Released
Undecided
Unassigned

Bug Description

http://logs.openstack.org/98/124198/3/check/check-grenade-dsvm-icehouse/c89f18f/console.html#_2014-09-26_03_38_56_940

2014-09-26 03:38:57.259 | Traceback (most recent call last):
2014-09-26 03:38:57.259 | File "tempest/scenario/manager.py", line 142, in delete_wrapper
2014-09-26 03:38:57.259 | delete_thing(*args, **kwargs)
2014-09-26 03:38:57.259 | File "tempest/services/volume/json/volumes_client.py", line 108, in delete_volume
2014-09-26 03:38:57.259 | resp, body = self.delete("volumes/%s" % str(volume_id))
2014-09-26 03:38:57.259 | File "tempest/common/rest_client.py", line 225, in delete
2014-09-26 03:38:57.259 | return self.request('DELETE', url, extra_headers, headers, body)
2014-09-26 03:38:57.259 | File "tempest/common/rest_client.py", line 435, in request
2014-09-26 03:38:57.259 | resp, resp_body)
2014-09-26 03:38:57.259 | File "tempest/common/rest_client.py", line 484, in _error_checker
2014-09-26 03:38:57.259 | raise exceptions.BadRequest(resp_body)
2014-09-26 03:38:57.259 | BadRequest: Bad request
2014-09-26 03:38:57.260 | Details: {u'message': u'Invalid volume: Volume status must be available or error, but current status is: in-use', u'code': 400}
2014-09-26 03:38:57.260 | }}}
2014-09-26 03:38:57.260 |
2014-09-26 03:38:57.260 | traceback-2: {{{
2014-09-26 03:38:57.260 | Traceback (most recent call last):
2014-09-26 03:38:57.260 | File "tempest/common/rest_client.py", line 561, in wait_for_resource_deletion
2014-09-26 03:38:57.260 | raise exceptions.TimeoutException(message)
2014-09-26 03:38:57.260 | TimeoutException: Request timed out
2014-09-26 03:38:57.260 | Details: (TestEncryptedCinderVolumes:_run_cleanups) Failed to delete resource 704461b6-3421-4959-8113-a011e6410ede within the required time (196 s).
2014-09-26 03:38:57.260 | }}}
2014-09-26 03:38:57.260 |
2014-09-26 03:38:57.261 | traceback-3: {{{
2014-09-26 03:38:57.261 | Traceback (most recent call last):
2014-09-26 03:38:57.261 | File "tempest/services/volume/json/admin/volume_types_client.py", line 97, in delete_volume_type
2014-09-26 03:38:57.261 | resp, body = self.delete("types/%s" % str(volume_id))
2014-09-26 03:38:57.261 | File "tempest/common/rest_client.py", line 225, in delete
2014-09-26 03:38:57.261 | return self.request('DELETE', url, extra_headers, headers, body)
2014-09-26 03:38:57.261 | File "tempest/common/rest_client.py", line 435, in request
2014-09-26 03:38:57.261 | resp, resp_body)
2014-09-26 03:38:57.261 | File "tempest/common/rest_client.py", line 484, in _error_checker
2014-09-26 03:38:57.261 | raise exceptions.BadRequest(resp_body)
2014-09-26 03:38:57.261 | BadRequest: Bad request
2014-09-26 03:38:57.261 | Details: {u'message': u'Target volume type is still in use.', u'code': 400}
2014-09-26 03:38:57.262 | }}}
2014-09-26 03:38:57.262 |
2014-09-26 03:38:57.262 | Traceback (most recent call last):
2014-09-26 03:38:57.262 | File "tempest/test.py", line 142, in wrapper
2014-09-26 03:38:57.262 | return f(self, *func_args, **func_kwargs)
2014-09-26 03:38:57.262 | File "tempest/scenario/test_encrypted_cinder_volumes.py", line 56, in test_encrypted_cinder_volumes_luks
2014-09-26 03:38:57.262 | self.attach_detach_volume()
2014-09-26 03:38:57.262 | File "tempest/scenario/test_encrypted_cinder_volumes.py", line 49, in attach_detach_volume
2014-09-26 03:38:57.262 | self.nova_volume_detach()
2014-09-26 03:38:57.262 | File "tempest/scenario/manager.py", line 439, in nova_volume_detach
2014-09-26 03:38:57.262 | 'available')
2014-09-26 03:38:57.262 | File "tempest/services/volume/json/volumes_client.py", line 181, in wait_for_volume_status
2014-09-26 03:38:57.263 | raise exceptions.TimeoutException(message)
2014-09-26 03:38:57.263 | TimeoutException: Request timed out
2014-09-26 03:38:57.263 | Details: Volume 704461b6-3421-4959-8113-a011e6410ede failed to reach available status within the required time (196 s).

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiRGV0YWlsczogKFRlc3RFbmNyeXB0ZWRDaW5kZXJWb2x1bWVzOl9ydW5fY2xlYW51cHMpIEZhaWxlZCB0byBkZWxldGUgcmVzb3VyY2VcIiBBTkQgbWVzc2FnZTpcIndpdGhpbiB0aGUgcmVxdWlyZWQgdGltZVwiIEFORCB0YWdzOlwiY29uc29sZVwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiI2MDQ4MDAiLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsidXNlcl9pbnRlcnZhbCI6MH0sInN0YW1wIjoxNDExNzM4OTc0MTMwfQ==

130 hits in 7 days, check and gate, all failures.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

The delete of the volume starts here:

http://logs.openstack.org/98/124198/3/check/check-grenade-dsvm-icehouse/c89f18f/logs/new/screen-c-vol.txt.gz#_2014-09-26_03_38_53_037

I see some volume detach messages after that.

There is an error in the n-cpu logs:

http://logs.openstack.org/98/124198/3/check/check-grenade-dsvm-icehouse/c89f18f/logs/new/screen-n-cpu.txt.gz?level=TRACE#_2014-09-26_03_32_17_403

It fails to detach the volume which is why cinder can't delete it (because it's in-use), so we timeout on the delete of the volume.

tags: added: volumes
Changed in nova:
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :
Download full text (3.4 KiB)

2014-09-26 03:32:17.403 ERROR nova.compute.manager [req-89e6d1c1-822a-4387-8af1-93c3e007bec6 TestEncryptedCinderVolumes-1122192554 TestEncryptedCinderVolumes-1697943167] [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] Failed to detach volume 704461b6-3421-4959-8113-a011e6410ede from /dev/vdb
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] Traceback (most recent call last):
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] File "/opt/stack/new/nova/nova/compute/manager.py", line 4216, in _detach_volume
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] encryption=encryption)
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1384, in detach_volume
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] encryptor.detach_volume(**encryption)
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] File "/opt/stack/new/nova/nova/volume/encryptors/cryptsetup.py", line 101, in detach_volume
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] self._close_volume(**kwargs)
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] File "/opt/stack/new/nova/nova/volume/encryptors/luks.py", line 106, in _close_volume
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] run_as_root=True, check_exit_code=True)
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] File "/opt/stack/new/nova/nova/utils.py", line 165, in execute
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] return processutils.execute(*cmd, **kwargs)
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] File "/opt/stack/new/nova/nova/openstack/common/processutils.py", line 193, in execute
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] cmd=' '.join(cmd))
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] ProcessExecutionError: Unexpected error while running command.
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] Command: sudo nova-rootwrap /etc/nova/rootwrap.conf cryptsetup luksClose ip-127.0.0.1:3260-iscsi-iqn.2010-10.org.openstack:volume-704461b6-3421-4959-8113-a011e6410ede-lun-1
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] Exit code: 5
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-8c87-70d7118fc8c8] Stdout: ''
2014-09-26 03:32:17.403 22504 TRACE nova.compute.manager [instance: 4374ceb4-1ac1-48dc-...

Read more...

no longer affects: cinder
Revision history for this message
Matt Riedemann (mriedem) wrote :

Bug 1348204 is a race failure on the attach side of encrypted volumes.

summary: - Race to delete volume in test_encrypted_cinder_volumes_luks
+ test_encrypted_cinder_volumes_luks fails to detach encrypted volume
Revision history for this message
Matt Riedemann (mriedem) wrote :

Exit code 5 from "cryptsetup luksClose" means the device is busy:

https://code.google.com/p/cryptsetup/issues/detail?id=100

So we should just retry on that until it detaches or we timeout.

Revision history for this message
Matt Riedemann (mriedem) wrote :

The code doesn't do any kind of retry right now, so adding that should help:

    def _close_volume(self, **kwargs):
        """Closes the device (effectively removes the dm-crypt mapping)."""
        LOG.debug("closing encrypted volume %s", self.dev_path)
        utils.execute('cryptsetup', 'luksClose', self.dev_name,
                      run_as_root=True, check_exit_code=True)

Changed in nova:
importance: Undecided → High
status: Confirmed → Triaged
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/124418

Changed in nova:
status: Triaged → In Progress
Matt Riedemann (mriedem)
tags: added: icehouse-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/124418
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=eef97cdf4bb7f426d7feb394ef54510db8b1656b
Submitter: Jenkins
Branch: master

commit eef97cdf4bb7f426d7feb394ef54510db8b1656b
Author: Matt Riedemann <email address hidden>
Date: Fri Sep 26 07:31:49 2014 -0700

    Retry on closing of luks encrypted volume in case device is busy

    We're seeing races in the gate on detach of an encrypted volume where
    the device is busy so the detach fails, which causes the delete of the
    volume on the cinder side to fail because the volume is still in use and
    then Tempest times out waiting for the volume to be deleted.

    This simply adds a retry to close the encryption device. I use
    attempts=3 since that's what we use for lvremove.

    Closes-Bug: #1374458

    Change-Id: Ia0667da7dee247ba8c3338b296244b2a71d1045d

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-rc1 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/130727

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/130727
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0695e144f7469ba98247e625fadc9a979a87af91
Submitter: Jenkins
Branch: stable/icehouse

commit 0695e144f7469ba98247e625fadc9a979a87af91
Author: Matt Riedemann <email address hidden>
Date: Fri Sep 26 07:31:49 2014 -0700

    Retry on closing of luks encrypted volume in case device is busy

    We're seeing races in the gate on detach of an encrypted volume where
    the device is busy so the detach fails, which causes the delete of the
    volume on the cinder side to fail because the volume is still in use and
    then Tempest times out waiting for the volume to be deleted.

    This simply adds a retry to close the encryption device. I use
    attempts=3 since that's what we use for lvremove.

    Closes-Bug: #1374458

    Change-Id: Ia0667da7dee247ba8c3338b296244b2a71d1045d
    (cherry picked from commit eef97cdf4bb7f426d7feb394ef54510db8b1656b)

tags: added: in-stable-icehouse
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.