Autopilot can hang while killing a process

Bug #1449153 reported by Leo Arias on 2015-04-27
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Autopilot
Undecided
Unassigned

Bug Description

Digging into https://jenkins.qa.ubuntu.com/job/generic-deb-autopilot-runner-vivid-mako/2127/console I found that for some reason the process is left defunct after autopilot tries to kill it:

elopio 6850 0.1 0.0 0 0 ? Zs 10:59 0:00 [webapp-containe] <defunct>

In that state, when we try process.communicate() in the _kill_process method, autopilot gets stuck.

We should pass a timeout to the communicate method.

Vincent Ladeuil (vila) wrote :

Do we have a reproducible recipe with a lost setup ?

Changed in autopilot:
status: New → Confirmed
Vincent Ladeuil (vila) wrote :

<cough>

Do we have a way to reproduce this locally I meant.

Leo Arias (elopio) wrote :

I'm guessing that we can make a process that catches the first kill signal and does nothing else, so communicate will wait forever. Haven't tried though.

Vincent Ladeuil (vila) wrote :

https://bazaar.launchpad.net/~udd/udd/import-scripts/view/head:/udd/tests/test_threads.py has tests for monitoring subprocess using SIGTERM and SIGKILL.

The corresponding code is known to run in production for years without issues in that specific area (simple but robust subprocess tracking).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers