[2.1] MAAS cannot power query with Cisco UCSM power driver

Bug #1611999 reported by Jeff Lane on 2016-08-11
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Newell Jensen
2.1
High
Newell Jensen
Trunk
High
Newell Jensen

Bug Description

I'm trying to set up a blade in a Cisco chassis via UCSM. I'm using MAAS 2.0 and it looks like MAAS is failing to properly probe power.

tl;dr:
MAAS can successfully turn power on and off for nodes that are part of a UCSM controlled chassis. MAAS can NOT correctly get power status from UCSM so user has to manually use Power Off from action menu in MAAS to put node in the correct state.

Here's what happens (and I'll attach screen shots showing each step).

1: System not in MAAS. UCSM shows blade power OFF.
2: Power system on in UCSM. System boots, enlists, shuts off. UCSM shows power state OFF. MAAS shows system in New state with no power type set.
3: Configure power type with U/P, UUID and API URL and save changes
4: MAAS checks power state and shows System in New state with Power ON, evne though UCSM shows that the blade power state is OFF.
5: Clicking Check Now re-probes power and still shows as ON.
6: Manually set Power OFF in MAAS by selecting Power Off from the Action menu.
7: MAAS NOW shows power state as OFF.
8: Check Now also says power state is OFF

At this point, maas.log only shows this for the UCS node:
bladernr@lucuma:/var/log/maas$ grep unfluffed-cherise maas.log
Aug 10 18:59:06 lucuma maas.api: [INFO] unfluffed-cherise: Enlisted new machine
Aug 10 20:11:12 lucuma maas.power: [INFO] Changing power state (off) of node: unfluffed-cherise (4y3h8r)
Aug 10 20:11:14 lucuma maas.power: [INFO] Changed power state (off) of node: unfluffed-cherise (4y3h8r)

The last two are where I manually set power OFF using the Action menu.

9: I click Commission from the Action menu.
10: MAAS shows power state as ON and node state as Commissioning.
11: UCSM confirms that power has turned on, so MAAS CAN control power.
12: Commissioning completes. UCSM shows power state OFF, MAAS shows system as Ready, with power state ON.

In the MAAS long, we see where maas turned the power on:
Aug 10 20:14:42 lucuma maas.node: [INFO] unfluffed-cherise: Status transition from NEW to COMMISSIONING
Aug 10 20:14:42 lucuma maas.power: [INFO] Changing power state (on) of node: unfluffed-cherise (4y3h8r)
Aug 10 20:14:42 lucuma maas.node: [INFO] unfluffed-cherise: Commissioning started
Aug 10 20:14:44 lucuma maas.power: [INFO] Changed power state (on) of node: unfluffed-cherise (4y3h8r)

13: MAAS changes node state to Ready, Power remains ON

maas.log shows this:
Aug 10 20:17:52 lucuma maas.interface: [INFO] enp7s0 (physical) on unfluffed-cherise: Observed connected to fabric-1 via 10.1.10.0/23.
Aug 10 20:17:53 lucuma maas.node: [INFO] unfluffed-cherise: Storage layout was set to flat.
Aug 10 20:17:53 lucuma maas.node: [INFO] unfluffed-cherise: Status transition from COMMISSIONING to READY

Note that UCSM shows the node as powered OFF, MAAS still shows the power ON. Once again, I have to manually select Power Off from the Action menu and then maas changes state to Power Off.

maas.log shows this when I manually tell it to turn the power off:
Aug 10 20:20:15 lucuma maas.power: [INFO] Changing power state (off) of node: unfluffed-cherise (4y3h8r)
Aug 10 20:20:17 lucuma maas.power: [INFO] Changed power state (off) of node: unfluffed-cherise (4y3h8r)

Now that MAAS thinks the power is off, I can begin a deployment. MAAS acquires the node correctly, changes state to Deploying, and sets the Power state to ON... UCSM shows the node ON. This appears in the MAAS log when I do the deployment:

Aug 10 20:20:30 lucuma maas.node: [INFO] unfluffed-cherise: Status transition from READY to ALLOCATED
Aug 10 20:20:30 lucuma maas.node: [INFO] unfluffed-cherise: allocated to user bladernr
Aug 10 20:20:30 lucuma maas.interface: [INFO] Allocated automatic IP address 10.1.11.7 for enp6s0 (physical) on unfluffed-cherise.
Aug 10 20:20:30 lucuma maas.node: [INFO] unfluffed-cherise: Status transition from ALLOCATED to DEPLOYING
Aug 10 20:20:30 lucuma maas.power: [INFO] Changing power state (on) of node: unfluffed-cherise (4y3h8r)
Aug 10 20:20:33 lucuma maas.power: [INFO] Changed power state (on) of node: unfluffed-cherise (4y3h8r)
Aug 10 20:22:21 lucuma maas.preseed: [INFO] unfluffed-cherise: custom network and storage options are only supported on Ubuntu. Using flat storage layout.

Related branches

Jeff Lane (bladernr) wrote :

Here are the various screenshots showing the state on UCSM and MAAS

description: updated
Changed in maas:
status: New → Incomplete
Andres Rodriguez (andreserl) wrote :

Hi Jeff,

1. What happens if you click on "Check Power" on the node details page. Does the power update correctly?

2. While you do "Check Power" can you attach /var/log/maas/*.log?

That said, looking at your statement

"Note that UCSM shows the node as powered OFF, MAAS still shows the power ON. Once again, I have to manually select Power Off from the Action menu and then maas changes state to Power Off."

MAAS does not check power every second. It checks it every 3 minutes or so. So the machine may finish commissioning, and the power may be showed as ON in MAAS, but the machine may be actually off. The reason for this is that after commissioning, a power query is requested, but what may be happening is that the power query happens while the node is still on (and powering off), and it doesn't update in the UI until it actually checks the power again.

Based on the logs, it seems that the node has transitioned correctly between states and you are experiencing the above. I'll mark this bug as invalid for the time being. Please, re-open the bug is the following happens:

1. Commission the node.
2. The node is in Ready state, show's power is ON (but hardware itself is off).
3. You do one of the two:
  3.1 You click on "Check Power" and the node remains as ON (even though the hardware is off) or.
  3.2 After 5 minutes, the power hasn't changed automatically.

Changed in maas:
status: Incomplete → Invalid
Download full text (11.0 KiB)

On Wed, Aug 10, 2016 at 8:39 PM, Andres Rodriguez
<email address hidden> wrote:
> Hi Jeff,
>
> 1. What happens if you click on "Check Power" on the node details page.
> Does the power update correctly?

No... For example, after commissioning, MAAS says the node is in Ready
state, and says Power is ON, while UCSM says the power is OFF.
Clicking the Check Power link doesn't change this. The only way I
could find to change it was to select "Power Off" from the Action
menu. THEN MAAS correctly showed the power as being Off.

> 2. While you do "Check Power" can you attach /var/log/maas/*.log?

I can... I've attached the logs. However, NOTHING appears in the logs
when I click "Check Power". I just retried several times, and there
is no log activity when I do that.

>
> That said, looking at your statement
>
> "Note that UCSM shows the node as powered OFF, MAAS still shows the
> power ON. Once again, I have to manually select Power Off from the
> Action menu and then maas changes state to Power Off."
>
> MAAS does not check power every second. It checks it every 3 minutes or
> so. So the machine may finish commissioning, and the power may be showed
> as ON in MAAS, but the machine may be actually off. The reason for this
> is that after commissioning, a power query is requested, but what may be
> happening is that the power query happens while the node is still on
> (and powering off), and it doesn't update in the UI until it actually
> checks the power again.
>
> Based on the logs, it seems that the node has transitioned correctly
> between states and you are experiencing the above. I'll mark this bug as
> invalid for the time being. Please, re-open the bug is the following
> happens:
>
> 1. Commission the node.
> 2. The node is in Ready state, show's power is ON (but hardware itself is off).
> 3. You do one of the two:
> 3.1 You click on "Check Power" and the node remains as ON (even though the hardware is off) or.
> 3.2 After 5 minutes, the power hasn't changed automatically.

So I recommissioned the node. After it was done, once again, state
became "Ready" but MAAS still said the power was on. UCSM shows the
power to the blade as Off.
I clicked the "Check Power" button 5 times, and it never changed the
status in MAAS.
I checked the logs, and there was NO log activity at all to correspond
with my "Check Power" clicks.
I left the node to sit well over 5 minutes after it was marked Ready,
and the power state in MAAS still says ON.
I clicked the "Check Power" button 5 more times, and it still didn't
change the status in MAAS.
And again nothing in the logs from my clicks.

According to MAAS.log, at 22:36, the node was changed to READY, and
the power was marked as ON until 22:56 when I manually used the "Power
Off" action in the Action menu in order to set the power status
correctly so I could deploy the node and kick off testing.

Aug 10 22:36:33 lucuma maas.node: [INFO] oil-pullman-02: Storage
layout was set to flat.
Aug 10 22:36:33 lucuma maas.node: [INFO] oil-pullman-02: Status
transition from COMMISSIONING to READY
Aug 10 22:40:31 lucuma maas.power: [ERROR] Power state could not be
queried: Failed to login to virsh console.
Aug 10 2...

Changed in maas:
status: Invalid → Confirmed
Changed in maas:
importance: Undecided → High
summary: - MAAS 2 does not properly probe power state on Cisco UCS systems
+ [2.0] MAAS cannot power query with Cisco UCSM power driver

@Jeff,

Also, what MAAS version are you using again ? Are you on the latest RC4?

On Thu, Aug 11, 2016 at 8:04 AM, Andres Rodriguez
<email address hidden> wrote:
> @Jeff,
>
> Also, what MAAS version are you using again ? Are you on the latest RC4?

Yes, the server is running RC4

Once I'm done running the regression testing, I'll hold onto this system so someone can take a look at it. It's an OIL system.

Thanks! I'll send newell your way when he comes online!

On Thu, Aug 11, 2016 at 8:28 AM, Jeff Lane <email address hidden>
wrote:

> Once I'm done running the regression testing, I'll hold onto this system
> so someone can take a look at it. It's an OIL system.
>
> --
> You received this bug notification because you are subscribed to MAAS.
> https://bugs.launchpad.net/bugs/1611999
>
> Title:
> [2.0] MAAS cannot power query with Cisco UCSM power driver
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1611999/+subscriptions
>

--
Andres Rodriguez (RoAkSoAx)
Ubuntu Server Developer
MSc. Telecom & Networking
Systems Engineer

Steps:

1. Commissioned blade --> Ready state with power showing as ON.
2. UCSM shows blade as powered OFF.
3. Manually quiered the UCSM API and blade shows as power ON.

Thus, either the UCSM UI has a bug by showing it is OFF or the UCSM API has a bug showing that it is ON (when it is actually OFF). Either way, MAAS is doing what it should.

Marking as invalid as this is not a MAAS bug.

Newell Jensen (newell-jensen) wrote :

Here are the commands that I ran against the API on lucuma:

ubuntu@lucuma:~$ sudo maas-region shell
Python 3.5.2 (default, Jul 5 2016, 12:43:10)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from provisioningserver.drivers.hardware.ucsm import power_state_ucsm
>>> uuid = '5c60b934-a0e3-4b94-b964-438230e89dca'
>>> url = 'http://192.168.224.92/'
>>> username = 'ubuntu'
>>> password = '1Nsecure!'
>>> power_state_ucsm(url, username, password, uuid)
power_control: b'<lsPower dn="org-root/ls-B200M3_SND/power" state="up"/> '
'on'

I edited the call to power_state_ucsm to print out the XML that it was getting back from the API.

Jeff Lane (bladernr) wrote :
Download full text (3.6 KiB)

Using Newell's comment 9 above, I did some testing on our blade chassis with the node I initially saw this on.

To investigate this, I logged into the MAAS server and re-commissioned the node. I noted the IP address used during commissioning and connected to the node and did a 'tail -f' on the cloud-init log to watch the commissioning process.

Once Commissioning was completed, it looks as though the "power down" command issued in the running OS, was successful. I was force logged out and the machine became unresponsive to SSH requests.

Now, at this point, there are three things:
1: MAAS Shows the power state as ON, but overall state as Ready.
2: The API is returning ON:
>>> power_state_ucsm(url, username, password, uuid)
up
'on'
3: UCSM UI shows power state as OFF.

So it seems the API is telling MAAS the power is still on, although the UCSM UI shows it as off.

Now... in order to resolve this issue, I must manually use the "Power Off" action from the MAAS Action menu for this node. This tells MAAS to send a Power Off command via UCSM to the node. This causes the API to finally tell us that the node is OFF.

This is the XML from the API after commissioning:
<configResolveChildren
cookie="1471266515/8e269db3-c7ff-4b26-9c13-addd9668db2c"
response="yes" classId="lsPower"> <outConfigs> <lsPower
dn="org-root/ls-B200M3_SND/power" state="up"/> </outConfigs>
</configResolveChildren>

Note, the node appears to be powered off at this time according to the UCSM UI and the unresponsiveness of the node to pings and SSH, but the API says it's still up.

After manually powering the node down, the API shows the state as Down, correctly.
<configResolveChildren
cookie="1471267730/436439b8-378d-4cc8-bfd6-fcf7bc5f1ca0"
response="yes" classId="lsPower"> <outConfigs> <lsPower
dn="org-root/ls-B200M3_SND/power" state="down"/> </outConfigs>
</configResolveChildren>

Now if I deploy the system, once I've gotten the correct power state from the API, the node powers on, and the API and UI reflect this:
<configResolveChildren
cookie="1471267817/220ac2c0-c44d-4179-871b-09bbdb347c6b"
response="yes" classId="lsPower"> <outConfigs> <lsPower
dn="org-root/ls-B200M3_SND/power" state="up"/> </outConfigs>
</configResolveChildren>

Once deployed, I have two Action options in the MAAS UI. I can Power Down or Release. If I Power Down, the state changes correctly. If I release, it ALSO changes correctly. I think this is because when I use either of those, we DIRECTLY issue a Power Down via the API.

But if I issue a power-down using "shutdown -t now" from inside the OS on the node, the API incorrectly reports the node as being powered up, and thus MAAS incorrectly reports the node as being powered up. The USCM applet correctly shows the power as Down.

So my suspicion here is that when the node is powered down in OS, using the "shutdown" command in Ubuntu, the node DOES power off, but the API is never updated somehow. So it seems like it's very possible to get the API out of sync with reality by simply using in-band poweroff commands.

I also left the node sitting for 20 minutes after I issued the in-OS shutdown command and the API still said that the node was powered...

Read more...

Jeff Lane (bladernr) wrote :

Just follow up, Upstream determined that this was actually a regression/bug in the UCSM API and will resolve it on their end with an update to UCSM that fixes the issue.

Andres Rodriguez (andreserl) wrote :

Thanks for the confirmation Jeff.

jeffrey leung (jefleung) wrote :

Hi,

I opened up a bug on Cisco's end to check this out. There was a misunderstanding of the XML model on our part initially but after working with development we found that when MAAS checks the power status, it was querying the wrong attribute. When power is changed by UCSM, we call that the Desired Power State. I've attached a screenshot where you can see from the status of a Service Profile under the Servers tab after commissioning was finished on a new node.

The desired power state never changed after MAAS uses UCSM to power on and commission but issues the shutdown through the OS. This is why when we tried to check the power status through MAAS, it still returns with Power On despite overall status in UCSM shows Power Off.

Continue to use lsPower for controlling the power state, but to check the actual power status it's recommended to use the operState under the lsServer classID. I copied the XML from a service profile with the issue and the lsPower state was still up, but the lsServer operState showed as powered off:

lsServer
operState="power-off"

-Jeff

no longer affects: maas/2.0
summary: - [2.0] MAAS cannot power query with Cisco UCSM power driver
+ [2.1] MAAS cannot power query with Cisco UCSM power driver
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers