Comment 143 for bug 1573062

Revision history for this message
Colin Ian King (colin-king) wrote :

We're getting zombies here which aren't being reaped:

130428 ? Z 0:00 [stress-ng-brk] <defunct>
130432 ? Z 0:00 [stress-ng-brk] <defunct>
130434 ? Z 0:00 [stress-ng-brk] <defunct>
130436 ? Z 0:00 [stress-ng-brk] <defunct>

The reason for this is that memory stressors like brk have a parent that forks off a child. The child performs the stressing and if it gets OOM'd the parent can spawn off another stressor. So I think the SIGKILL on the stress-ng brk stressor is killing the parent bug the child (which is still holding onto a load of memory on the heap) is not being waited for and hence is in a memory hogging zombie state. We may be in a pathologically memory hogging state because the zombies may be holding brk regions that are swapped out to disk due to memory pressure and we're hitting a low-memory state which is not being cleared up.

I suggest modifying the test bash script as follows:

1. run stress-ng with -k flag (so that all the processes have the same stress-ng name)
2. kill with ALRM first
3. then kill with KILL all the stress-ng processes after a small grace period.
4. report on unkillable stressors

refer to the changes I made to https://launchpadlibrarian.net/296974522/disk_stress_ng