sharder can spin on a DB with bad epoch
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Object Storage (swift) |
New
|
Undecided
|
Unassigned |
Bug Description
Sometimes DB files can have an epoch that doesn't match the epoch recorded on the own-shard-range in the DB. Usually, this is because some other node updated the epoch, the shard range replicated over, and the DB that received it hasn't had a chance to re-shard. If there's some filesystem corruption that impacts the DB name, though, you can get into trouble. If it's a bitflip that lowers the on-disk epoch, you're OK: it looks a lot like the normal-operations path above. On the other hand, if it *raises* it... the sharder sees that the on-disk epoch is different; creates a new, empty DB with the epoch from the own shard range; copies everything over; and finally deletes the DB it just created :-/
Next cycle, the exact same thing happens.