Comment 37 for bug 1640547

Revision history for this message
Colin Ian King (colin-king) wrote :

I've made some modifications to the script (see attached), the changes include:

1. kill with ALRM first, then kill with KILL if this does not work after a small grace period. Also report on unkillable stressors
2. bump up async I/O threshold for machines with lots of CPUs
3. force hdd to do sync writes, that way we don't backlog with gazillions of pending I/Os on machines with a lot of memory and many CPUs
4. limit readahead file size so that this stressor does not spend most of it's time generating a test file before it can start testing readaheads

I've run this through several times with the latest stress-ng and it runs through to completion.

So I think we were suffering from issues where loads of pending I/Os from stressors plus bad cleanup on nuked stressors were causing massive I/O backlogs which caused the system to clag up.