No retries for AMT power?
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
MAAS |
Fix Released
|
Critical
|
Raphaël Badin |
Bug Description
I had an AMT controlled node which failed to power on - event log in https:/
Whilst debugging I looked at the AMT power template in MAAS 1.7.0~beta3+
There is a comment saying that retries are handled by the "core power driver" - is that accurate? Assuming it is, why is there this:
query_state() {
# Retry the state if it fails because it often fails the first time.
local state=
local count=
state=
if [ -n "$state" ]
then
break
fi
# Wait 1 second between queries AMT controllers are generally very
# light and may not be comfortable with more frequent queries.
sleep 1
case "$state" in
S[0-4])
# Wide awake (S0), or asleep (S1-S4), but not a clean slate that
# will lead to a fresh boot.
echo 'on'
;;
S5)
echo 'off'
;;
*)
fail 2 "Got unknown power state from node: '$state'"
;;
esac
}
Note the break without any loop
Note also the sleeping /after/ querying the AMT controller.
What is the API between templates and the power drivers to have the template say "please retry me"? Is it just fail?
Related branches
- Jeroen T. Vermeulen (community): Approve
-
Diff: 42 lines (+1/-10)1 file modifiedetc/maas/templates/power/amt.template (+1/-10)
- Gavin Panella (community): Approve
-
Diff: 86 lines (+36/-20)1 file modifiedetc/maas/templates/power/amt.template (+36/-20)
summary: |
- AMT power template strangeness + No retries for AMT power? |
description: | updated |
Changed in maas: | |
assignee: | nobody → Raphaël Badin (rvb) |
importance: | High → Critical |
Changed in maas: | |
status: | Triaged → Fix Committed |
Changed in maas: | |
milestone: | none → 1.7.0 |
Changed in maas: | |
status: | New → Fix Committed |
Changed in maas: | |
status: | Fix Committed → Fix Released |
https:/ /pastebin. canonical. com/117626/ this is the output from a failed power on event that led me to look at the template to see if it was still doing retries.