Comment 11 for bug 1807723

Revision history for this message
Matt Riedemann (mriedem) wrote : Re: swap multiattach volume intermittently fails when servers are on different hosts

With the debug patch I see that we're definitely having some weird DB issue with BDMs during server delete:

logs.openstack.org/78/606978/5/check/tempest-slow/5a90cad/controller/logs/screen-n-api.txt.gz#_Dec_10_20_29_39_767102

Dec 10 20:29:39.767102 ubuntu-xenial-rax-ord-0001105586 <email address hidden>[23323]: ERROR nova.compute.api [None req-04f71fae-15dd-4cc0-b211-4a4f53a7cbc8 tempest-TestMultiAttachVolumeSwap-1722594678 tempest-TestMultiAttachVolumeSwap-1722594678] [instance: c3c9407c-e2af-4d04-94ed-f334844ea6bf] No volume BDMs found for server.
Dec 10 20:29:39.775519 ubuntu-xenial-rax-ord-0001105586 <email address hidden>[23323]: ERROR nova.compute.api [None req-04f71fae-15dd-4cc0-b211-4a4f53a7cbc8 tempest-TestMultiAttachVolumeSwap-1722594678 tempest-TestMultiAttachVolumeSwap-1722594678] [instance: c3c9407c-e2af-4d04-94ed-f334844ea6bf] BDMs were already deleted: [BlockDeviceMapping(attachment_id=None,boot_index=0,connection_info=None,created_at=2018-12-10T20:28:24Z,delete_on_termination=True,deleted=False,deleted_at=None,destination_type='local',device_name='/dev/vda',device_type='disk',disk_bus=None,guest_format=None,id=1,image_id='863afc54-1096-4382-b8f2-6103641a65c1',instance=<?>,instance_uuid=c3c9407c-e2af-4d04-94ed-f334844ea6bf,no_device=False,snapshot_id=None,source_type='image',tag=None,updated_at=2018-12-10T20:28:25Z,uuid=be69a559-05ee-43bd-8170-2c65cc2a518c,volume_id=None,volume_size=None,volume_type=None), BlockDeviceMapping(attachment_id=be11fe1f-5c65-4a64-a5c6-caa934f564c9,boot_index=None,connection_info='{"status": "reserved", "multiattach": true, "detached_at": "", "volume_id": "26af085c-f977-4508-8bb1-46a57c8f34ed", "attach_mode": "null", "driver_volume_type": "iscsi", "instance": "c3c9407c-e2af-4d04-94ed-f334844ea6bf", "attached_at": "", "serial": "26af085c-f977-4508-8bb1-46a57c8f34ed", "data": {"access_mode": "rw", "target_discovered": false, "encrypted": false, "qos_specs": null, "target_iqn": "iqn.2010-10.org.openstack:volume-26af085c-f977-4508-8bb1-46a57c8f34ed", "target_portal": "10.210.4.21:3260", "volume_id": "26af085c-f977-4508-8bb1-46a57c8f34ed", "target_lun": 1, "device_path": "/dev/sda", "auth_password": "***", "auth_username": "P2uFcW9nEpz3GcuKwqHv", "auth_method": "CHAP"}}',created_at=2018-12-10T20:28:37Z,delete_on_termination=False,deleted=True,deleted_at=2018-12-10T20:29:39Z,destination_type='volume',device_name='/dev/vdb',device_type=None,disk_bus=None,guest_format=None,id=4,image_id=None,instance=<?>,instance_uuid=c3c9407c-e2af-4d04-94ed-f334844ea6bf,no_device=False,snapshot_id=None,source_type='volume',tag=None,updated_at=2018-12-10T20:28:44Z,uuid=15aba0c7-c
Dec 10 20:29:39.776294 ubuntu-xenial-rax-ord-0001105586 <email address hidden>[23323]: 612-40a4-9445-dfe964764b02,volume_id='26af085c-f977-4508-8bb1-46a57c8f34ed',volume_size=1,volume_type=None)]

That's one of the TestMultiAttachVolumeSwap servers being deleted and its volume bdm that represents the multiattach volume is showing up as deleted already, so we don't detach it, but it's not yet clear to me what is deleting it.

I have a suspicion that in the original scenario:

a) volume1 attached to server1 and server2
b) swap volume1 to volume2 on server1
c) delete server1 and server2
d) delete volume1 fails because it's not disconnected

There is something going on with step (b) where server1 is linked backed to both volume1 and volume2 and when deleting server1, we delete the volume1 bdm even though we fail to disconnect it (because it's still attached to server2).