tempest

Bug #1807723
Comment #5

Comment 5 for bug 1807723

Revision history for this message

Matt Riedemann (mriedem) wrote on 2018-12-10: Re: swap multiattach volume intermittently fails when servers are on different hosts

Looking at the swap volume flow in nova again, I think

https://github.com/openstack/nova/blob/ae3064b7a820ea02f7fc8a1aa4a41f35a06534f1/nova/compute/manager.py#L5798-L5806

is likely intentional since for volume1/server1 there is only a single BDM record. It starts out with the old volume_id and attachment_id, and then once the swap is complete, the compute manager code, rather than delete the old BDM for vol1/server1 and create a new BDM for vol2/server1, it just updates the BDM for vol1/serv1 to point at vol2 (updates the connection_info, attachment_id and volume_id to be for vol2 instead of vol1).

So when we delete server1, it should cleanup volume2 from the bdm that points at volume2. And when we delete server2, it should cleanup volume1 from the bdm that points at volume1.

The problem likely goes back to the old vol1/server1 attachment delete failing here:

https://github.com/openstack/cinder/blob/9d4fd16bcd8eca930910798cc519cb5bc5846c59/cinder/volume/manager.py#L4528

Maybe we need to retry in that case if we're hitting a race? Maybe we need to detect "tgtadm: this target is still active" here:

https://github.com/openstack/cinder/blob/9d4fd16bcd8eca930910798cc519cb5bc5846c59/cinder/volume/targets/tgt.py#L283

And retry?