Autopilot can hang while killing a process

Bug #1449153 reported by Leo Arias
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Autopilot
Confirmed
Undecided
Unassigned

Bug Description

Digging into https://jenkins.qa.ubuntu.com/job/generic-deb-autopilot-runner-vivid-mako/2127/console I found that for some reason the process is left defunct after autopilot tries to kill it:

elopio 6850 0.1 0.0 0 0 ? Zs 10:59 0:00 [webapp-containe] <defunct>

In that state, when we try process.communicate() in the _kill_process method, autopilot gets stuck.

We should pass a timeout to the communicate method.

Revision history for this message
Vincent Ladeuil (vila) wrote :

Do we have a reproducible recipe with a lost setup ?

Changed in autopilot:
status: New → Confirmed
Revision history for this message
Vincent Ladeuil (vila) wrote :

<cough>

Do we have a way to reproduce this locally I meant.

Revision history for this message
Leo Arias (elopio) wrote :

I'm guessing that we can make a process that catches the first kill signal and does nothing else, so communicate will wait forever. Haven't tried though.

Revision history for this message
Vincent Ladeuil (vila) wrote :

https://bazaar.launchpad.net/~udd/udd/import-scripts/view/head:/udd/tests/test_threads.py has tests for monitoring subprocess using SIGTERM and SIGKILL.

The corresponding code is known to run in production for years without issues in that specific area (simple but robust subprocess tracking).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.