Relinker marks partition as relinked even when there were errors

Bug #1926648 reported by Tim Burke
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Object Storage (swift)
Fix Released
Critical
Unassigned

Bug Description

The relinker is meant to be this idempotent tool where you can run it repeatedly until you're satisfied that everything has been linked where it needs to go. But even if there are errors processing a partition:

vagrant@saio:~$ swift-object-relinker --devices /srv/node1/ relink
Processing files for policy replicated under /srv/node1/ (cleanup=False)
Error relinking: failed to relink /srv/node1/sdb1/objects-1/150/629/964cde5676d168a068572556b7eed629/1619719681.44005.data to /srv/node1/sdb1/objects-1/300/629/964cde5676d168a068572556b7eed629/1619719681.44005.data: [Errno 17] File exists: '/srv/node1/sdb1/objects-1/150/629/964cde5676d168a068572556b7eed629/1619719681.44005.data' -> '/srv/node1/sdb1/objects-1/300/629/964cde5676d168a068572556b7eed629/1619719681.44005.data'
Step: relink Device: sdb1 Policy: replicated Partitions: 1/1
1 hash dirs processed (cleanup=False) (1 files, 0 linked, 0 removed, 1 errors)

...we write down that the partition was relinked...

vagrant@saio:~$ < /srv/node1/sdb1/relink.objects-1.json jq
{
  "part_power": 8,
  "next_part_power": 9,
  "state": {
    "150": true
  }
}

...which causes a subsequent run to skip it and gives a false sense that everything's OK:

vagrant@saio:~$ swift-object-relinker --devices /srv/node1/ relink
Processing files for policy replicated under /srv/node1/ (cleanup=False)
0 hash dirs processed (cleanup=False) (0 files, 0 linked, 0 removed, 0 errors)

Operators can work around it by removing the state file between runs, but then every partition will be processed again. It'd be way better if the error prevented the partition from being marked completed -- then operators could do one run, check for errors, perform whatever manual intervention might be necessary, then kick off another run that will only attempt to process the partitions that had errors.

Changed in swift:
status: New → In Progress
clayg (clay-gerrard)
Changed in swift:
importance: Undecided → Critical
Revision history for this message
Tim Burke (1-tim-z) wrote :

Fixed in https://review.opendev.org/c/openstack/swift/+/788089:

  relinker: Only mark partitions "done" if there were no (new) errors

  This way operators can re-run the relinker in the face of errors without
  needing to manually clear the state file.

  Change-Id: Ida1c1c0c8a695b1b226121b426b8226a43f3056b
  Co-Authored-By: Clay Gerrard <email address hidden>

Will be included in swift 2.28.0.

Changed in swift:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.