Multiple part power increases leads to misplaced data

Bug #1910589 reported by Tim Burke on 2021-01-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Undecided
Unassigned

Bug Description

I ran through part power increase once, and everything was great. I put a some data in my vsaio, and every step along the way I could use swift-account-audit to verify everything was still accessible: http://paste.openstack.org/show/801495/

My part power was still ridiculously low, so I went to increase it again. Starts out well enough:

========================================
vagrant@saio:~/swift$ swift-ring-builder /etc/swift/object.builder prepare_increase_partition_power
The next partition power is now 6.
The change will take effect after the next write_ring.
Ensure your proxy-servers, object-replicators and
reconstructors are using the changed rings and relink
(using swift-object-relinker) your existing data
before the partition power increase
vagrant@saio:~/swift$ swift-ring-builder /etc/swift/object.builder write_ring
vagrant@saio:~/swift$ swift-account-audit AUTH_test
Auditing account "AUTH_test"
Auditing container "c"

  Accounts checked: 1

Containers checked: 1

   Objects checked: 83
========================================

The relink was a little funky (but didn't actually error), and the audit kept passing, so I kept going:

========================================
vagrant@saio:~/swift$ for i in {1..4}; do swift-object-relinker relink --devices /srv/node$i ; done
Relinking files for policy default under /srv/node1
Relinked 0 diskfiles (0 errors)
Relinking files for policy default under /srv/node2
Relinked 0 diskfiles (0 errors)
Relinking files for policy default under /srv/node3
Relinked 0 diskfiles (0 errors)
Relinking files for policy default under /srv/node4
Relinked 0 diskfiles (0 errors)
vagrant@saio:~/swift$ swift-account-audit AUTH_test
Auditing account "AUTH_test"
Auditing container "c"

  Accounts checked: 1

Containers checked: 1

   Objects checked: 83
vagrant@saio:~/swift$ swift-ring-builder /etc/swift/object.builder increase_partition_power
The partition power is now 6.
The change will take effect after the next write_ring.
vagrant@saio:~/swift$ swift-ring-builder /etc/swift/object.builder write_ring
========================================

But at this point it all goes to pot:

========================================
vagrant@saio:~/swift$ swift-account-audit AUTH_test
Auditing account "AUTH_test"
Auditing container "c"
  Bad status HEADing object "/AUTH_test/c/..." on 127.0.0.3/sdb7
  Bad status HEADing object "/AUTH_test/c/..." on 127.0.0.1/sdb5
  ...
  Failed fo fetch object /AUTH_test/c/... at all!

  Accounts checked: 1

Containers checked: 1

   Objects checked: 83
  Missing Replicas: 492
========================================

All those "Relinked 0 diskfiles (0 errors)" lines? We didn't relink *anything*! The trouble is the progress state we introduced in https://review.opendev.org/c/openstack/swift/+/695344 -- it keeps us from needing to reprocess partitions if the process needs to be restarted (which is good and useful!), but there's nothing to remove the status files once a particular increase completes.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers