Comment 2 for bug 1809472

Revision history for this message
kirubakaran kaliannan (kirubak) wrote :

This bug i raised after the following scenerio in a production setup,

we have a 2-wway storage policy. one of the drive experienced XFS filesystem corruption.
Ran xfs_repair which moved some objects to lost+found (objects which are in lost+found also lost its metadata informations, so we cannot recover those objects and left for replicator to take care of this missing objects).

Alos, There is no way that we can identify which all suffix in all the partiiton need to be invalidated after xfs_repair. Also, operator's never know that we are just running with a single copy of the object.

Agree on that this is unacceptable amount of IO to recalcualte 1/10 of the suffix on every replicator run.

But, In this situation after xfs_repair, we need a way to recalculate the hash on all the suffix.

Question:

1. Do we currently have any procedure followed after the xfs_repair on the filesystem, to make sure all the replica objects are present ?

2. we need Some way of self healing is required in this situation. any thoughts ?.