Log information about process that has taken a machine

Bug #1158743 reported by Javier Collado
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
UTAH
Won't Fix
Medium
Andy Doan

Bug Description

When UTAH fails to allocate a physical machine for testing, it prints to the
log something like this:

2013-03-22 07:56:10,940 root DEBUG: Executing SQL statement: SELECT COUNT(*) FROM machines, []
2013-03-22 07:56:10,955 root DEBUG: Executing SQL statement: SELECT * FROM machines WHERE name=?, ['dx-autopilot-intel']
2013-03-22 07:56:10,955 root ERROR: Exception: All machines meeting criteria are currently unavailable

https://jenkins.qa.ubuntu.com/job/ps-unity-100scopes-experimental-autopilot-release-testing/label=autopilot-intel/11/console

Trying to troubleshoot this problem is hard because:
1) The way jenkins is configured, no more than one job is launched
   simultaneously on a physical machine, that is, there's only one process at a
   time that is trying to run tests on the same physical machine.
2) Even if UTAH failed to cleanup the inventory database, it checks if there's a
   running process with the PID in the database whose command line contains
   'run_test' or 'utah'.

The only way to honor both 1) and 2) would be that there's a running process
that didn't allocate the machine, but has 'run_test' or 'utah' in the command
line.

I'd say this is unlikely, so to help into troubleshooting this problem, I'd
like to write the information about the process that is locking the machine in
the inventory, so that it can be checked from the command line at least if it's
really running test cases in the same physical machine (which would mean 1)
isn't true then).

Related branches

Andy Doan (doanac)
Changed in utah:
importance: Undecided → Medium
status: New → In Progress
assignee: nobody → Andy Doan (doanac)
Revision history for this message
Javier Collado (javier.collado) wrote :

I think the root cause of the problem is probably bug1160247, that is,
- jenkins sends SIGTERM to process
- process ignores it
- in the next run the machine is still locked

Revision history for this message
Javier Collado (javier.collado) wrote :

Looking at the code that was finally merged, it doesn't seem that what was
requested in the bug description was really done. However, the root cause of
the problem was identified and fixed in bug1160247, so it's safe to set this
one as "Won't fix".

Changed in utah:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.