tempest.scenario.test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern fails with rbd

Bug #1554045 reported by Michal Jura
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Michal Jura

Bug Description

Sometimes for rbd Tempest scenario test_volume_boot_pattern.TestVolumeBootPatternV2.test_volume_boot_pattern fails with following error

2016-02-22 21:46:55.983 23489 WARNING cinder.volume.drivers.rbd [req-9ae2daa0-8df2-44fa-941f-9590ab8fae85 3c43fc722be1439dbdbc6487f468a924 18dad7f31d074553ab095fdf7cc7ce4e - - -] ImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed.

2016-02-22 21:46:55.989 23489 ERROR cinder.volume.manager [req-9ae2daa0-8df2-44fa-941f-9590ab8fae85 3c43fc722be1439dbdbc6487f468a924 18dad7f31d074553ab095fdf7cc7ce4e - - -] Unable to delete busy volume.

Changed in cinder:
assignee: nobody → Michal Jura (mjura)
status: New → In Progress
Revision history for this message
Michal Jura (mjura) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/289252

Michal Jura (mjura)
tags: added: ceph
Michal Jura (mjura)
tags: added: drivers
Revision history for this message
Jon Bernard (jbernard) wrote :

Logs related to this particular failure would be most helpful. I've looked at several of these, there is at least one race in nova's bdm code - if you're able to reproduce that, the resulting logs could help isolate the bug.

Changed in cinder:
status: In Progress → Incomplete
Revision history for this message
Michal Jura (mjura) wrote :
Changed in cinder:
status: Incomplete → In Progress
Revision history for this message
Arx Cruz (arxcruz) wrote :

This is also affects tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON.test_rescued_vm_detach_volume

Even with patch applied, I'm still seeing this (although test_volume_boot_pattern pass):

2016-03-09 08:57:52.629 32268 INFO cinder.volume.flows.manager.create_volume [req-863b5a63-a0a0-4958-8901-2d85bdd44f4c ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] Volume 8c115952-c088-4ac7-91c9-5b8dee5d591d: being created as raw with specification: {'status': u'creating', 'volume_size': 1, 'volume_name': u'volume-8c115952-c088-4ac7-91c9-5b8dee5d591d'}
2016-03-09 08:57:52.943 32268 INFO cinder.volume.flows.manager.create_volume [req-863b5a63-a0a0-4958-8901-2d85bdd44f4c ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] Volume volume-8c115952-c088-4ac7-91c9-5b8dee5d591d (8c115952-c088-4ac7-91c9-5b8dee5d591d): created successfully
2016-03-09 08:57:52.947 32268 INFO cinder.volume.manager [req-863b5a63-a0a0-4958-8901-2d85bdd44f4c ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] Created volume successfully.
2016-03-09 08:57:55.831 32268 INFO cinder.volume.manager [req-a4cdf415-e2d4-41a4-8698-e0cd49750c07 ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] Initialize volume connection completed successfully.
2016-03-09 08:57:56.849 32268 INFO cinder.volume.manager [req-73c7efc6-2fa9-42a3-bf4f-f07fe7cc262b ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] Attach volume completed successfully.
2016-03-09 08:58:11.883 32268 INFO cinder.volume.manager [req-ef697eda-a3c9-44ee-9a88-286a1bebc9fc ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] Terminate volume connection completed successfully.
2016-03-09 08:58:12.473 32268 INFO cinder.volume.manager [req-11d6fc86-f425-4a5b-9c1d-c4429af1e7b0 ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] Detach volume completed successfully.
2016-03-09 08:58:43.437 32268 WARNING cinder.volume.drivers.rbd [req-b9a359cf-53fc-46ec-83f0-20aac64a7e5e ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] ImageBusy error raised while deleting rbd volume. This may have been caused by a connection from a client that has crashed and, if so, may be resolved by retrying the delete after 30 seconds has elapsed.
2016-03-09 08:58:43.447 32268 ERROR cinder.volume.manager [req-b9a359cf-53fc-46ec-83f0-20aac64a7e5e ed2b7665621e40e48f0290628e2ca064 b8a17af901de48399243b8a8cdd38129 - - -] Unable to delete busy volume.

Revision history for this message
Michal Jura (mjura) wrote :

This might be something else but I'm trying to reproduce this problem.

Right now this test is passing for me

tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON
    test_rescued_vm_detach_volume[id-f56e465b-fe10-48bf-b75d-646cda3a8bc9,negative,volume]OK 32.86
Slowest 1 tests took 32.86 secs:
tempest.api.compute.servers.test_server_rescue_negative.ServerRescueNegativeTestJSON
    test_rescued_vm_detach_volume[id-f56e465b-fe10-48bf-b75d-646cda3a8bc9,negative,volume] 32.86
Ran 1 test in 68.525s
OK

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/289252
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=66bd2b39d7490facc55dee3cab9523e8608fe491
Submitter: Jenkins
Branch: master

commit 66bd2b39d7490facc55dee3cab9523e8608fe491
Author: Michal Jura <email address hidden>
Date: Mon Mar 7 11:29:32 2016 +0100

    Fix failure with rbd on slow ceph clusters

    Make rados connection interval and retries configurable
    for _try_remove_volume() function

    Otherwise on slow ceph clusters, we can get following problem:

    "ImageBusy error raised while deleting rbd volume. This may have been
    caused by a connection from a client that has crashed and, if so,
    may be resolved by retrying the delete after 30 seconds has elapsed."

    Change-Id: I1230715663ea00c3eb4241154e6f194dee0e23d4
    Co-Authored-By: Dirk Mueller <email address hidden>
    Closes-Bug: #1554045

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/291036

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/liberty)

Reviewed: https://review.openstack.org/291036
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=59312777551baadab41ec9baa7517132b58f1142
Submitter: Jenkins
Branch: stable/liberty

commit 59312777551baadab41ec9baa7517132b58f1142
Author: Michal Jura <email address hidden>
Date: Mon Mar 7 11:29:32 2016 +0100

    Fix failure with rbd on slow ceph clusters

    Make rados connection interval and retries configurable
    for _try_remove_volume() function

    Otherwise on slow ceph clusters, we can get following problem:

    "ImageBusy error raised while deleting rbd volume. This may have been
    caused by a connection from a client that has crashed and, if so,
    may be resolved by retrying the delete after 30 seconds has elapsed."

    Change-Id: I1230715663ea00c3eb4241154e6f194dee0e23d4
    Co-Authored-By: Dirk Mueller <email address hidden>
    Closes-Bug: #1554045
    (cherry picked from commit 66bd2b39d7490facc55dee3cab9523e8608fe491)

tags: added: in-stable-liberty
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/cinder 8.0.0.0rc1

This issue was fixed in the openstack/cinder 8.0.0.0rc1 release candidate.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/cinder 7.0.2

This issue was fixed in the openstack/cinder 7.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

This issue was fixed in the openstack/cinder 7.0.2 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.