API takes too long to return an object involving power actions (and may prevent adding machines)

Bug #1547277 reported by Andres Rodriguez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Expired
Critical
Unassigned

Bug Description

When MAAS can't power manage a machine, it shouldn't prevent the machine from being added with power management details.

MAAS should:

1. Allow the machine to be added.
2. Query all RAck Controllers to see which ones can power manage a machine.
3. If no Rack Controller can power manage it, then it should prevent the deployment of it.
4. MAAS should be very clear that the machine cannot be power managed because no rack can connect to it.

summary: - MAAS prevents adding a machine when it cannot power manage it.
+ [2.0] MAAS prevents adding a machine when it cannot power manage it.
Changed in maas:
importance: Undecided → Critical
Gavin Panella (allenap)
Changed in maas:
status: New → Triaged
Changed in maas:
milestone: none → 2.1.0
tags: added: error-surface
Changed in maas:
milestone: 2.1.0 → 2.1.1
Changed in maas:
milestone: 2.1.1 → 2.1.2
Changed in maas:
milestone: 2.1.2 → 2.1.3
Changed in maas:
importance: Critical → High
milestone: 2.1.3 → none
milestone: none → 2.4.0beta2
Changed in maas:
milestone: 2.4.0beta2 → 2.4.0beta3
Changed in maas:
importance: High → Critical
milestone: 2.4.0beta3 → 2.4.0rc1
Revision history for this message
Andres Rodriguez (andreserl) wrote : Re: [2.0] MAAS prevents adding a machine when it cannot power manage it.

I believe that this issue may now be gone, but there's a more critical issue. I'm trying to add a machine over the API, and it seems that MAAS waits to power manage the machine, or similar before it returns the object to the user.

I run the following command:

time maas admin machines create hostname=node05 domain=maaslab architecture=amd64 mac_addresses=b8:ae:ed:7c:4d:75 power_type=amt power_parameters_power_pass=Password1+ power_parameters_power_address=192.168.100.15

Almost immediately after I run the command, I see the following:

==> /var/log/maas/maas.log <==
Apr 20 22:38:09 localhost maas.api: [info] node05: Enlisted new machine
Apr 20 22:38:09 localhost maas.node: [info] node05: Status transition from NEW to COMMISSIONING
Apr 20 22:38:09 localhost maas.power: [info] Changing power state (on) of node: node05 (4agpxc)

However, the API doesn't return the object immediately, it took 45 seconds to return the object. ABout 40 seconds later from when MAAS added the logs.

SO I believe this is a related issue to the above described. I believe the power management is blocking the object from being returned immediately.

Revision history for this message
Andres Rodriguez (andreserl) wrote :
Download full text (3.9 KiB)

After trying to add a machine that I know will know exist, I can confirm this theory:

ubuntu@maas00:~$ time maas admin machines create hostname=test1 domain=maaslab architecture=amd64 mac_addresses=b8:ae:ed:7c:4d:72 power_type=amt power_parameters_power_pass=Password1+ power_parameter
s_power_address=192.168.100.133
[...]
real 0m57.302s
user 0m2.086s
sys 0m0.141s

Effectively, once you execute the command, it waits for the machine to see if it can power on before returning the object.

==> /var/log/maas/maas.log <==
Apr 20 22:43:03 localhost maas.power: [error] Error changing power state (on) of node: test1 (sewabs)
Apr 20 22:43:03 localhost maas.node: [info] test1: Status transition from COMMISSIONING to FAILED_COMMISSIONING
Apr 20 22:43:03 localhost maas.node: [error] test1: Marking node failed: Power on for the node failed: Could not contact node's BMC: Could not connect to BMC. Check BMC configuration and try again.

==> /var/log/maas/rackd.log <==
2018-04-20 22:43:03 provisioningserver.rpc.power: [critical] test1: Power on failed.
        Traceback (most recent call last):
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 459, in callback
            self._startRunCallbacks(result)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 567, in _startRunCallbacks
            self._runCallbacks()
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 653, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1442, in gotResult
            _inlineCallbacks(r, g, deferred)
        --- <exception caught here> ---
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1384, in _inlineCallbacks
            result = result.throwExceptionIntoGenerator(g)
          File "/usr/lib/python3/dist-packages/twisted/python/failure.py", line 408, in throwExceptionIntoGenerator
            return g.throw(self.type, self.value, self.tb)
          File "/usr/lib/python3/dist-packages/provisioningserver/rpc/power.py", line 287, in change_power_state
            system_id, hostname, power_type, power_change, context)
          File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1386, in _inlineCallbacks
            result = g.send(result)
          File "/usr/lib/python3/dist-packages/provisioningserver/drivers/power/__init__.py", line 418, in perform_power
            raise exc_info[0](exc_info[1]).with_traceback(exc_info[2])
          File "/usr/lib/python3/dist-packages/provisioningserver/drivers/power/__init__.py", line 379, in perform_power
            power_func, system_id, context)
          File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 250, in inContext
            result = inContext.theWork()
          File "/usr/lib/python3/dist-packages/twisted/python/threadpool.py", line 266, in <lambda>
            inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
          File "/usr/lib/python3/dist-packages/twisted/python/context.py", line 122, in callWithContext
    ...

Read more...

summary: - [2.0] MAAS prevents adding a machine when it cannot power manage it.
+ API takes too long to return an object involving power actions (and may
+ prevent adding machines)
tags: added: performance
Changed in maas:
assignee: nobody → Lee Trager (ltrager)
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Seems this issue has now been fixed with: https://bugs.launchpad.net/maas/+bug/1761600

Changed in maas:
status: Triaged → Incomplete
assignee: Lee Trager (ltrager) → nobody
Changed in maas:
milestone: 2.4.0rc1 → 2.4.0rc2
Revision history for this message
Adam Collard (adam-collard) wrote :

This bug has not seen any activity in the last 6 months, so it is being automatically closed.

If you are still experiencing this issue, please feel free to re-open.

MAAS Team

Changed in maas:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.