MAAS fails to power up machines when trying to install nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
High
|
Gavin Panella | ||
1.2 |
Fix Released
|
High
|
Raphaël Badin | ||
maas (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Precise |
Fix Released
|
High
|
Unassigned | ||
Quantal |
Won't Fix
|
High
|
Unassigned | ||
Raring |
Fix Released
|
High
|
Unassigned |
Bug Description
Integration tests on raring (package built from trunk) are failing since Apr 19, 2013 11:31:01 PM.
http://
http://
MAAS fails to power up machines when installing a node. This is not happening every single time so this is a racy issue.
[Impact]
This affects corner cases when nodes are told by MAAS to be started when they haven't finished their shut down process. However, the fix enables MAAS to tell the nodes to make the power action regardless of their current state.
This is a corner case because before deploying nodes maas ensures that all nodes are turned off, however, there can be cases on which nodes have not finished their turn off process after commissioning, and the nodes are told to be deployed.
[Test Case]
To reproduce, simply do the following:
1. Install maas and enlist/commission IPMI nodes.
2. turn on manually one of the nodes.
3. With maas, deploy the node.
4. With the fix, the node will be rebooted and the installation will proceed. Without the fix the installation will never start.
[Regression Potential]
Minimal. This change has been tested in the Lab and manual testing. It ensures that the machine gets powered off/on regardless of its current state power state, allowing MAAS to perform the action always, when requested. The Server Team and MAAS team are committed to provided appropriate fixes in event of a regression.
Related branches
- Raphaël Badin (community): Approve
- Andres Rodriguez: Pending requested
-
Diff: 55 lines (+6/-24)2 files modifiedsrc/provisioningserver/power/templates/ipmi.template (+6/-14)
src/provisioningserver/power/tests/test_poweraction.py (+0/-10)
- Raphaël Badin (community): Approve
-
Diff: 68 lines (+7/-25)3 files modifiedsrc/provisioningserver/power/templates/ipmi.template (+6/-14)
src/provisioningserver/power/tests/test_poweraction.py (+0/-10)
versions.cfg (+1/-1)
Changed in maas: | |
assignee: | nobody → Gavin Panella (allenap) |
status: | Triaged → Fix Committed |
Changed in maas (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → High |
description: | updated |
description: | updated |
Changed in maas: | |
status: | Fix Committed → Fix Released |
Changed in maas (Ubuntu Precise): | |
importance: | Undecided → High |
Changed in maas (Ubuntu Quantal): | |
importance: | Undecided → High |
Changed in maas (Ubuntu Raring): | |
importance: | Undecided → High |
tags: |
added: verification-done removed: verification-needed |
After investigating the issue, we found that the fix landed by https:/ /code.launchpad .net/~vanhoof/ maas/ipmi- state-fix_ lp1086160/ +merge/ 159714 (fix for bug 1086160) is responsible for the problem: the fix landed in this branch uncovered a bug in how MAAS deals with ipmi.
Before bug 1086160 was fixed, the ipmi template was *always* issuing the power command (because get_power_state() was broken). Now that we check the state of the node before powering it up, it the node is being brought down but is still up when get_power_state() is called, the ipmipower command won't be issued.
This is an example of what happens: right after "--off" is issued, the node is still up and thus "--stat" returns "on": lenovo- RD230-01: ~$ ipmipower -h 192.168.22.33 -u root -p ubuntu --off && ipmipower -h 192.168.22.33 -u root -p ubuntu --stat
ubuntu@
192.168.22.33: ok <- this is the result of the "--off" command
192.168.22.33: on <- this is the result of the "--stat" command
ipmipower is clever enough to understand that, if "--on" is issued while the node is being powered down, the node needs to be powered up after it has gone down: lenovo- RD230-01: ~$ ipmipower -h 192.168.22.33 -u root -p ubuntu --off && ipmipower -h 192.168.22.33 -u root -p --on
ubuntu@
=> the node is powered down *then up*.
In conclusion, we should probably revert to the old behavior and not check the return value of "--stat" at all, just issue the --on/--off command. (Note that MAAS executes these ipmi commands asynchronously [using celery] so we cannot use ipmipower's --wait- until-on/ --wait- until-off commands to solve this problem).