Log information about process that has taken a machine
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
UTAH |
Won't Fix
|
Medium
|
Andy Doan |
Bug Description
When UTAH fails to allocate a physical machine for testing, it prints to the
log something like this:
2013-03-22 07:56:10,940 root DEBUG: Executing SQL statement: SELECT COUNT(*) FROM machines, []
2013-03-22 07:56:10,955 root DEBUG: Executing SQL statement: SELECT * FROM machines WHERE name=?, ['dx-autopilot-
2013-03-22 07:56:10,955 root ERROR: Exception: All machines meeting criteria are currently unavailable
Trying to troubleshoot this problem is hard because:
1) The way jenkins is configured, no more than one job is launched
simultaneously on a physical machine, that is, there's only one process at a
time that is trying to run tests on the same physical machine.
2) Even if UTAH failed to cleanup the inventory database, it checks if there's a
running process with the PID in the database whose command line contains
'run_test' or 'utah'.
The only way to honor both 1) and 2) would be that there's a running process
that didn't allocate the machine, but has 'run_test' or 'utah' in the command
line.
I'd say this is unlikely, so to help into troubleshooting this problem, I'd
like to write the information about the process that is locking the machine in
the inventory, so that it can be checked from the command line at least if it's
really running test cases in the same physical machine (which would mean 1)
isn't true then).
Related branches
- Javier Collado (community): Approve
- Max Brustkern (community): Approve
-
Diff: 135 lines (+80/-10)4 files modifieddebian/changelog (+3/-0)
tests/test_process.py (+44/-0)
utah/process.py (+25/-0)
utah/provisioning/baremetal/inventory.py (+8/-10)
- UTAH Dev: Pending requested
-
Diff: 74 lines (+26/-9)2 files modifieddebian/changelog (+6/-0)
utah/provisioning/baremetal/inventory.py (+20/-9)
- Javier Collado (community): Approve
- Max Brustkern (community): Approve
-
Diff: 12 lines (+1/-1)1 file modifiedutah/process.py (+1/-1)
Changed in utah: | |
importance: | Undecided → Medium |
status: | New → In Progress |
assignee: | nobody → Andy Doan (doanac) |
I think the root cause of the problem is probably bug1160247, that is,
- jenkins sends SIGTERM to process
- process ignores it
- in the next run the machine is still locked