Conflicting timeouts for commissioning, perhaps other actions

Bug #1439945 reported by Christian Reis
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
High
Unassigned

Bug Description

A user is trying to change the commissioning time to be longer. I can see:

maasserver/models/node.py:

    def get_commissioning_time(self):
        """Return the commissioning time of this node (in seconds).

        This is the maximum time the commissioning is allowed to take.
        """
        # Return a *very* conservative estimate for now.
        return timedelta(minutes=20).total_seconds()

but I can also see:

maas/settings.py:

  # The duration, in minutes, after which we consider a commissioning node
  # to have failed and mark it as FAILED_COMMISSIONING.
  COMMISSIONING_TIMEOUT = 60

So which is it?

Revision history for this message
Raphaël Badin (rvb) wrote :

It seems we have two ways in which a node can timeout:

a) the timeout can be automatic: i.e. a timer is started at the beginning of the commissioning and will both power down the node and mark it "commissioning failed" if it doesn't commission within 20 minutes.

b) one can call the nodes API 'check_commissioning' to get all the nodes that have been commissioning for more than 60 minutes by default and mark them 'failed commissioning' (note that this won't power down the nodes).

It seems b) was added to cope with failing commissioning before a) was introduced as a more generic way to to handle timeouts (deployment has the same "automatic timeout" mechanism).

It seems to me that b) is deprecated and fairly useless: commission can be aborted if need be.

Changed in maas:
status: New → Triaged
importance: Undecided → High
Revision history for this message
mahmoh (mahmoh) wrote :

The customer impact: booting a system with 1TB of memory takes several minutes and I have a customer that is working around this in the field now which fails at 20m.

Revision history for this message
Christian Reis (kiko) wrote :

Thanks for the analysis, Raphael. Precise as usual.

So check_commissioning() needs to be explicitly invoked by an API caller? And the nodes get marked as failed commissioning as a side-effect of that? I wonder if this was added to support embedding of MAAS.

Revision history for this message
Andres Rodriguez (andreserl) wrote :

We believe this is no longer an issue in the latest versions of MAAS. MAAS now tracks the different commissioning actions and keeps updating the timeouts while MAAS is continue to perform actions.

AS such, I'm marking this as Fix Released. If you believe this is still an issue, please re-open a bug report.

Changed in maas:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.