UTAH ignores SIGTERM

Bug #1160247 reported by Didier Roche-Tolomelli
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
UTAH
Fix Released
High
Javier Collado

Bug Description

When killing a jenkins job, SIGTERM is sent to UTAH (jcollado).

However, it seems UTAH is ignoring it and so, all further run of a jenkins job using same UTAH job will fail (machine locked in the database).

Tags: ue-desktop

Related branches

Revision history for this message
Javier Collado (javier.collado) wrote :

Looking at the log, it seems that the signal was handled and the cleanup
methods were called. However, the timestamps don't match, so maybe the signal
wasn't handled at all, but then it looks like nothing from the run that failed
was logged, which is also quite weird.

2013-03-25 14:27:06,869 dx-autopilot-nvidia DEBUG: Changing permissions of /tmp/dx-autopilot-nvidia_nhksAu/initrd.d/scripts/init-premount/brltty
2013-03-25 14:27:06,869 dx-autopilot-nvidia DEBUG: Changing permissions of /tmp/dx-autopilot-nvidia_nhksAu/initrd.d/scripts/init-premount/ORDER
2013-03-25 14:27:06,869 dx-autopilot-nvidia DEBUG: Changing permissions of /tmp/dx-autopilot-nvidia_nhksAu/initrd.d/scripts/init-premount/lvm2
2013-03-25 14:27:06,869 dx-autopilot-nvidia DEBUG: Recursively Removing directory /tmp/dx-autopilot-nvidia_nhksAu
2013-03-25 14:27:07,369 dx-autopilot-nvidia DEBUG: Running cleanup
2013-03-26 08:47:47,901 root DEBUG: Logging already configured. Skipping logging configuration.
2013-03-26 08:47:47,976 root DEBUG: Executing SQL statement: SELECT COUNT(*) FROM machines, []
2013-03-26 08:47:47,988 root DEBUG: Executing SQL statement: SELECT * FROM machines WHERE name=?, ['dx-autopilot-nvidia']
2013-03-26 08:47:47,988 root ERROR: Exception: All machines meeting criteria are currently unavailable

Changed in utah:
status: New → Confirmed
tags: added: ue-desktop
Revision history for this message
Javier Collado (javier.collado) wrote :

Running a few tests locally as suggested by Andy, I've verified that jenkins is
happy to consider the job as aborted just sending the SIGTERM signal. This
means that if the process ignores it, it might remain alive and be involved in
conflicts like in this case (physical machine provisioned in the inventory).

Andy Doan (doanac)
Changed in utah:
importance: Undecided → High
assignee: nobody → Javier Collado (javier.collado)
Changed in utah:
status: Confirmed → In Progress
status: In Progress → Fix Committed
Revision history for this message
Didier Roche-Tolomelli (didrocks) wrote :

is it going to be deployed? Seems like veebers hit the same issue on http://10.97.0.1:8080/job/ps-generic-autopilot-release-testing/33/

Revision history for this message
Andy Doan (doanac) wrote :

we are going to make a release of UTAH this week that will include this fix. It probably won't go into production until Monday.

Revision history for this message
Javier Collado (javier.collado) wrote :

I've marked the bug as Fix Released. Note that this means that the bug is fixed
in the daily PPA, but doesn't necessarily mean it's fixed in the stable PPA. In
case of doubt, you'll find that information in the e-mail that is sent when a
new stable version is released or in the changelog from the package itself.

Changed in utah:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.