Comment 7 for bug 2019190

Revision history for this message
melanie witt (melwitt) wrote : Re: [RBD] Retyping of in-use boot volumes renders instances unusable (possible data corruption)

I uploaded a DNM tempest patch to run modified TestVolumeMigrateRetypeAttached tests in tempest/scenario/test_volume_migrate_attached.py with the master, stable/wallaby, and stable/victoria branches [1]:

  https://review.opendev.org/c/openstack/tempest/+/890360

The tests in ^ are modified to add a hard reboot of the instance at the end.

The migrate volume test passes in all branches while the retype volume test fails in master and stable/wallaby but passes in stable/victoria [2].

The unmodified tests will pass because they aren't hard rebooting the server to cause regeneration of guest XML.

In the test logs on the DNM patch [2], I think I might have also found why migrate works while retype fails.

The RBD driver [3] makes a decision about which path to take based on the volume status. In the test logs, it's showing that for migrate, the volume is 'in-use' and the RBD driver (correctly) considers this case to be a move across different pools and falls back to a generic migrate which calls the Nova swap volume API. For retype however, the volume status is 'retyping' so it doesn't refuse the assisted migration and it goes ahead.

Excerpts from the c-vol log:

migrate volume:

Aug 03 22:24:16.833416 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.manager [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Issue driver.migrate_volume. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/manager.py:2609}}
Aug 03 22:24:16.834270 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Attempting RBD assisted volume migration. volume: 9a27b9cd-e6e5-4f29-a127-a030e94c5356, host: {'host': 'np0034853654@ceph2#ceph2', 'cluster_name': None, 'capabilities': {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 24.56, 'free_capacity_gb': 24.56, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:018eb22d-04d2-464f-8294-675d033013df:cinder:othervolumes', 'backend_state': 'up', 'volume_backend_name': 'ceph2', 'replication_enabled': False, 'allocated_capacity_gb': 0, 'filter_function': None, 'goodness_function': None, 'timestamp': '2023-08-03T22:23:59.050934'}}, status=in-use. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1924}}
Aug 03 22:24:16.834270 np0034853654 cinder-volume[116332]: DEBUG os_brick.initiator.linuxrbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] opening connection to ceph cluster (timeout=-1). {{(pid=116332) connect /opt/stack/os-brick/os_brick/initiator/linuxrbd.py:70}}
Aug 03 22:24:16.861112 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:24:16.889427 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-1c151856-e8fb-41e3-ad42-36810f4fcec8 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Migration in-use volume between different pools. Falling back to generic migration. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1958}}

retype volume:

Aug 03 22:25:38.269904 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.manager [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Issue driver.migrate_volume. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/manager.py:2609}}
Aug 03 22:25:38.271201 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Attempting RBD assisted volume migration. volume: 59681499-b1b4-4fcb-af7a-24a64ded93df, host: {'host': 'np0034853654@ceph2#ceph2', 'cluster_name': None, 'capabilities': {'vendor_name': 'Open Source', 'driver_version': '1.2.0', 'storage_protocol': 'ceph', 'total_capacity_gb': 24.61, 'free_capacity_gb': 24.61, 'reserved_percentage': 0, 'multiattach': True, 'thin_provisioning_support': True, 'max_over_subscription_ratio': '20.0', 'location_info': 'ceph:/etc/ceph/ceph.conf:018eb22d-04d2-464f-8294-675d033013df:cinder:othervolumes', 'backend_state': 'up', 'volume_backend_name': 'ceph2', 'replication_enabled': False, 'allocated_capacity_gb': 0, 'filter_function': None, 'goodness_function': None, 'timestamp': '2023-08-03T22:24:59.055464'}}, status=retyping. {{(pid=116332) migrate_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1924}}
Aug 03 22:25:38.271569 np0034853654 cinder-volume[116332]: DEBUG os_brick.initiator.linuxrbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] opening connection to ceph cluster (timeout=-1). {{(pid=116332) connect /opt/stack/os-brick/os_brick/initiator/linuxrbd.py:70}}
Aug 03 22:25:38.289482 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:38.309237 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:39.129720 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:39.170401 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] volume has no backup snaps {{(pid=116332) _delete_backup_snaps /opt/stack/cinder/cinder/volume/drivers/rbd.py:1104}}
Aug 03 22:25:39.170857 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Volume volume-59681499-b1b4-4fcb-af7a-24a64ded93df is not a clone. {{(pid=116332) _get_clone_info /opt/stack/cinder/cinder/volume/drivers/rbd.py:1127}}
Aug 03 22:25:39.177397 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] deleting rbd volume volume-59681499-b1b4-4fcb-af7a-24a64ded93df {{(pid=116332) delete_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1247}}
Aug 03 22:25:39.242308 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] moving volume volume-59681499-b1b4-4fcb-af7a-24a64ded93df to trash {{(pid=116332) _try_remove_volume /opt/stack/cinder/cinder/volume/drivers/rbd.py:1235}}
Aug 03 22:25:39.318993 np0034853654 cinder-volume[116332]: INFO cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Successful RBD assisted volume migration.
Aug 03 22:25:39.347198 np0034853654 cinder-volume[116332]: INFO cinder.volume.manager [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Migrate volume completed successfully.
Aug 03 22:25:39.383312 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:39.410340 np0034853654 cinder-volume[116332]: DEBUG cinder.volume.drivers.rbd [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] connecting to cinder@ceph (conf=/etc/ceph/ceph.conf, timeout=-1). {{(pid=116332) _do_conn /opt/stack/cinder/cinder/volume/drivers/rbd.py:480}}
Aug 03 22:25:39.435421 np0034853654 cinder-volume[116332]: DEBUG cinder.manager [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Notifying Schedulers of capabilities ... {{(pid=116332) _publish_service_capabilities /opt/stack/cinder/cinder/manager.py:197}}
Aug 03 22:25:39.442048 np0034853654 cinder-volume[116332]: INFO cinder.volume.manager [None req-91712065-b862-4ce9-952d-75183cfd3ce9 tempest-TestVolumeMigrateRetypeAttached-2102186043 None] Retype volume completed successfully.

[1] (Note: I tried to include stable/train in the test runs but the job was unable to complete devstack setup due to failure to install the dependencies needed to run with the tempest master branch)
[2] https://review.opendev.org/c/openstack/tempest/+/890360
[3] https://github.com/openstack/cinder/blob/ff4b1c910e65274efcbc0fd052f1f9bc5a643603/cinder/volume/drivers/rbd.py#L2214