We've resolved our issues by disabling KSM on the affected nodes. All of the non-affected nodes didn't have KSM enabled (due to a packaging bug elsewhere). After disabling KSM, our problems went away gradually in ~3 days.
This means we're no longer affected by this issue (and given the other reports, probably never were).
We've resolved our issues by disabling KSM on the affected nodes. All of the non-affected nodes didn't have KSM enabled (due to a packaging bug elsewhere). After disabling KSM, our problems went away gradually in ~3 days.
This means we're no longer affected by this issue (and given the other reports, probably never were).