Comment 34 for bug 1640547

Revision history for this message
Colin Ian King (colin-king) wrote :

I'm still not happy about the /usr/lib/plainbox-provider-checkbox/bin/disk_stress_ng using timeout with a -9 (SIGKILL) to terminate stress-ng stressors. Stress-ng stressors can be *cleanly* terminated with a SIGALRM signal, this triggers all the processes to terminate once they have freed resources. Sending a SIGKILL will leave cruft everywhere and won't clean up shared memory segments, which isn't pleasant. It may even lead to deadlocking in some of the stressors that are waiting for an unlock because the unlocking parent gets nuked with a SIGKILL.

Note that some i/o related stressors generate a lot of I/O writes that need to be flushed out before a write and/or a close complete, so sending a SIGALRM or SIGKILL may not do anything immediately as the blocked system call is waiting for I/O to fully flush out.

I'm starting to think that SIGKILL'ing and reaping temp files while stressors are still running could be perilous; for example locking files and killing off processes while locks are open and then reaping files is not a good idea.

So:

1. checkbox script should run a test for a specified amount of time AND I suggest a maximum number of bogo ops, which ever comes first.

2. Stop stress-ng with SIGALRM and not SIGKILL

3. Don't reap files while a stressor is running. That's really ugly thing to do.

Once we have this fixed I am then happy to checkout any kernel related hangs. Meanwhile I'll push some trivial stress-ng changes out tomorrow in another bug fix release to address some of the corner cases I've spotted today.