IPMI commands are sent / queried too fast

Bug #1320513 reported by Robert Collins
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ironic
Fix Released
High
Chris Krelle
OpenStack Compute (nova)
Invalid
Low
Unassigned

Bug Description

http://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/second-gen-interface-spec-v2.pdf has this in it:
---
1.7.32
Configuration Interfaces
...
 In some implementations, changes to configuration parameters may take
effect immediately. Thus, a remote application should be careful when setting parameters that could cause the
application to become disconnected from the BMC.

For the purpose of conformance checking, up to 5 seconds will be allowed between the time a parameter is
changed to when it must have taken effect.
----

We've seen repeated cases of BMCs locking up or getting confused with high frequency polling - it might be an idea to wait 5 seconds - the required max time between change and effect - rather than the polling interval we use today.

Tags: ipmi baremetal
Changed in ironic:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ironic (master)

Fix proposed to branch: master
Review: https://review.openstack.org/96558

Changed in ironic:
assignee: nobody → Chris Krelle (nobodycam)
status: Confirmed → In Progress
Revision history for this message
aeva black (tenbrae) wrote :

Marking as "High" as the work around for this bug, once a user is bit by it, is to have someone in the datacenter physically power cycle a machine.

Changed in ironic:
importance: Undecided → High
tags: added: baremetal
Revision history for this message
aeva black (tenbrae) wrote :

Tagging Nova as the same code is present in nova/virt/baremetal/ipmi.py:

123 def _exec_ipmitool(self, command):
...
133 try:
134 args.append(pwfile)
135 args.extend(command.split(" "))
136 out, err = utils.execute(*args, attempts=3)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/96902

Changed in ironic:
assignee: Chris Krelle (nobodycam) → Devananda van der Veen (devananda)
Changed in ironic:
assignee: Devananda van der Veen (devananda) → Chris Krelle (nobodycam)
Changed in ironic:
assignee: Chris Krelle (nobodycam) → Devananda van der Veen (devananda)
Revision history for this message
Michael Still (mikal) wrote :

@devananda -- are you going to propose a fix for baremetal as well, or shall we "wont fix" this in favour of ironic?

Changed in ironic:
assignee: Devananda van der Veen (devananda) → Chris Krelle (nobodycam)
Changed in ironic:
assignee: Chris Krelle (nobodycam) → Devananda van der Veen (devananda)
Revision history for this message
Michael Davies (mrda) wrote :

I'll update the baremetal side of this based upon the solution ironic comes up with.

Changed in nova:
status: New → In Progress
assignee: nobody → Michael Davies (mrda)
Changed in ironic:
assignee: Devananda van der Veen (devananda) → Chris Krelle (nobodycam)
aeva black (tenbrae)
tags: added: ipmi
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ironic (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/99121

Changed in ironic:
assignee: Chris Krelle (nobodycam) → Devananda van der Veen (devananda)
aeva black (tenbrae)
Changed in ironic:
milestone: none → juno-2
Changed in ironic:
assignee: Devananda van der Veen (devananda) → Chris Krelle (nobodycam)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ironic (master)

Reviewed: https://review.openstack.org/99121
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=6318ee1dd1d758c799a3cf09d0736de5f07bdd72
Submitter: Jenkins
Branch: master

commit 6318ee1dd1d758c799a3cf09d0736de5f07bdd72
Author: Devananda van der Veen <email address hidden>
Date: Tue Jun 10 07:46:44 2014 -0700

    Stop ipmitool.validate from touching the BMC

    Stop the IPMITool driver from calling 'mc guid' in validate().

    Validate is currently called synchronously when API requests are sent to
      GET /v1/node/NNN/validate

    While work is ongoing to make the API more asynchronous, this presents
    a particular issue in that a user can spam this URL and overwhelm the
    hardware node's BMC.

    Furthermore, validate() is called internally in several places, which is
    further contributing to BMC instability as reported in the related
    bug 1320513.

    Change-Id: I2414d2b07e2ab86c85ca18bc033368ddf43f7f43
    Closes-bug: #1314954
    Related-bug: #1320513

Changed in ironic:
assignee: Chris Krelle (nobodycam) → Devananda van der Veen (devananda)
Changed in ironic:
assignee: Devananda van der Veen (devananda) → Chris Krelle (nobodycam)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ironic (master)

Reviewed: https://review.openstack.org/96902
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=12ef2bc621f6c713524d54a157bc3fe216b04977
Submitter: Jenkins
Branch: master

commit 12ef2bc621f6c713524d54a157bc3fe216b04977
Author: Devananda van der Veen <email address hidden>
Date: Fri May 30 11:59:54 2014 -0700

    Let ipmitool natively retry commands

    Instead of calling ipmitool multiple times on failure via
      utils.execute(*args, attempts=3)
    allow ipmitool to use its own native retry behavior with -N.. -R..
    if those options are supported by the installed version of ipmitool.
    This will fall back to a single run of ipmitool on older versions,
    which should be fine -- it defaults to retry several times anyway.

    This patch adds a configurable min time between retries, which is used,
    in conjunction with the ipmi retry time, to determine these option's
    values. It will be further leveraged in a subsequent patch as well.

    It also adds a note in the deployer docs about known issues with
    the openipmi project.

    Change-Id: I7a4ff941144a03bd441459561efb68760391da1a
    Partial-bug: #1320513

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/96558
Committed: https://git.openstack.org/cgit/openstack/ironic/commit/?id=53640a873000a12eebd627df555e7b2b0a27f659
Submitter: Jenkins
Branch: master

commit 53640a873000a12eebd627df555e7b2b0a27f659
Author: Chris Krelle <email address hidden>
Date: Thu May 29 14:22:06 2014 -0500

    Enforce a minimum time between all IPMI commands

    This patch enforces the min_command_interval option, which was added in
    the previous patch, to ensure that any given BMC is never "poked" more
    frequently than this interval, regardless of the calling method.

    Closes-Bug: #1320513

    Change-Id: Id3849aadf3908133a92157b3e96dd752610533e9

Changed in ironic:
status: In Progress → Fix Committed
Michael Still (mikal)
Changed in nova:
importance: Undecided → Low
Michael Davies (mrda)
Changed in nova:
status: In Progress → Confirmed
assignee: Michael Davies (mrda) → nobody
Changed in ironic:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in ironic:
milestone: juno-2 → 2014.2
Revision history for this message
Michael Still (mikal) wrote :

We don't have a baremetal driver in nova any more.

Changed in nova:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.