OpenStack Object Storage (swift)

object replicator dont recalculate the checksum of the suffix even when do_listdir is True

Bug #1809472 reported by kirubakaran kaliannan on 2018-12-21

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Object Storage (swift)	New	Undecided	Unassigned

Bug Description

The object replicator on every run, it recalculates 10% of the suffix checksum using the follwoing code
def _do_listdir(partition, replication_cycle):
return (((partition + replication_cycle) % 10) == 0)

But in __get_hashes(), the listdir is performed, but not setting the hash[suff] to None, instead setdefault(None). This dont help much in case of missing hashes in the suffix (due to xfs_repair etc)

We need to explicitely set the hashes[suff] to None, so the suffix hash is recalculated.

Follwoing is the proposed change. Please let me know if this change is good.

        if do_listdir:
            for suff in os.listdir(partition_path):
                if len(suff) == 3:
- hashes.setdefault(suff, None)
+ hashes[suff] = None
            modified = True

Tags:

kirubakaran kaliannan (kirubak) on 2018-12-21

summary:

- object replicator dont recalculate the checksum of the suffix even
+ object replicator dont recalculate the checksum of the suffix even when
do_listdir is True

Revision history for this message

clayg (clay-gerrard) wrote on 2018-12-21:

The intent there was not to force a recalculate on un-dirtied suffixes every 11th replication cycle.

That hook was only to do a *single* listdir IO to check if there's any suffixes on disk that are not in the hashes.pkl - if the suffix is in the hashes.pkl (and not dirtied) the current understanding of observed operating systems is that the hash would be correct [1].

Honestly the whole endeavor seemed suspect, I'm not sure anyone really observed production systems syncing around suffixes and failing to invalidate them with a post-REPLICATE request, but it was only *one* syscall/IO and the idea of a deterministic way to do it periodically was sort of cute so I guess someone decided it was worth the complexity.

1. understood to be correct, barring a bug. Operators (myself among them) have in the past observed mis-hashed suffixes (i.e. un-dirty hashed suffix value in hashes.pkl doesn't match recalculated hash on files in the suffix) - but those have all been explainable by bugs introduced with the hashes.invalid change that have all since been fixed and I haven't yet observed the issue on clusters deployed since we fixed those bugs. But because such a class of bugs is known to exist, there is broad support/interest in having an operator tunable that sets a *timer* to force a recalc on any un-dirtied suffix after some... weeks. But doing it every 10 cycles would burn an unacceptable amount of IO on stable systems.

Revision history for this message

kirubakaran kaliannan (kirubak) wrote on 2018-12-22:

This bug i raised after the following scenerio in a production setup,

we have a 2-wway storage policy. one of the drive experienced XFS filesystem corruption.
Ran xfs_repair which moved some objects to lost+found (objects which are in lost+found also lost its metadata informations, so we cannot recover those objects and left for replicator to take care of this missing objects).

Alos, There is no way that we can identify which all suffix in all the partiiton need to be invalidated after xfs_repair. Also, operator's never know that we are just running with a single copy of the object.

Agree on that this is unacceptable amount of IO to recalcualte 1/10 of the suffix on every replicator run.

But, In this situation after xfs_repair, we need a way to recalculate the hash on all the suffix.

Question:

1. Do we currently have any procedure followed after the xfs_repair on the filesystem, to make sure all the replica objects are present ?

2. we need Some way of self healing is required in this situation. any thoughts ?.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.