Comment 3 for bug 1504023

Revision history for this message
aeva black (tenbrae) wrote :

I am reopening this bug because I have confirmed that it is not fixed adequately, and I am still getting timeout errors against a 5th-gen NUC.

After some investigation, here is the root cause: it typically takes the AMT ME on my NUC about 3 seconds to wake up from a low power state. Some times, it takes longer, and when this happens, what ever command was requested (get power state, set power state, etc) will fail.

According to the Intel AMT ME docs, it can take up to 25 seconds for AMT to wake up from a low power state, and ping shouldn't be used to wake up the interface:

"If the ME is set to respond to Pings, ping the client before the action. Note: There are situations where a ping will not reply for the first 2-3 times. You would not want to use this method if doing this in an automated manner."
- https://software.intel.com/en-us/wake-up-amt

Here are some logs from pinging the AMT ME that demonstrate the behaviour of ME's wakeup.

ping with 0.1s interval: http://paste.openstack.org/show/481861/
ping with 1s interval: http://paste.openstack.org/show/481863/

Note the ARP response in each case -- AMT does not reply to the ICMP echo request until after it has gotten a response to the ARP who-has request.

More importantly, however, is that AMT ME is *not* caching the ARP table between ICMP sessions; at the start of every ping request, ME issues another ARP who-has, which, at least in my lab today, is taking a few seconds. This is causing the AMT driver to fail repeatedly.

For reference, I am testing with commit 64530a6c5bc8091f4960bc582318350e294fac51.

Suggested fix #1:
- document that deployers must configure their AMT devices *not* to enter a low power state

Suggested fix #2:
- when ever an AMT Node is powered off, begin a background thread which issues a slow ICMP ping to prevent ME from going into a low power state

Suggested fix #3:
- increase the timeouts within the driver for waking up ME to the intel-recommended 25 seconds