Comment 41 for bug 1423672

Revision history for this message
LouieGosselin (0-ubunbu-d) wrote :

It's happened again. I've spent several hours on this and I've been able to recreate the failure under some synthetic conditions with a sacrificial VM.

The filebench defaults do not cause an ext4 crash for me, but the following do:

load workloads/fileserver
set $dir=/tmp/
set $nfiles=200000
set $meandirwidth=30000
run 120

The ext4 error never happens in the filebench'es init phase, only 50s or so into the 50 threaded run phase. Less extreme settings won't produce a consistent crash.

Reducing the amount of free memory makes the errors much more likely.

This is before running filebench:
             total used free shared buffers cached
Mem: 482M 99M 382M 300K 27M 20M
-/+ buffers/cache: 52M 429M
Swap: 1.9G 94M 1.8G

This is while running filebench one second before the crash:
             total used free shared buffers cached
Mem: 482M 476M 5.6M 284K 27M 18M
-/+ buffers/cache: 430M 51M
Swap: 1.9G 253M 1.6G
2769.63

The error is reproducible in cloned VMs.

Moving swap to another disk changes nothing.

As far as I can tell, the error never happens with ext4 filesystems other than the root FS where executables are running from.

I've tried bonnie, stress-ng, and simple scripts, I have not been able to get these to crash ext4.

The sacrificial VM has not crashed after add an extra 500MB to it.

Although production was never under such heavy loads, I've added 500MB to the production VM to see if it helps anyways.