[2.1b2] failure to deploy all systems that lasted until reboot of maas container

Bug #1631421 reported by Larry Michel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Invalid
Undecided
Unassigned

Bug Description

We started seeing deployment failures on our maas container where system would show PXE installation but it did not look like kernel was ever booted. This was with 2.0 beta1

There was no rsyslog entries after the failures were observed and the event logs would show PXE installation then timeout:

 Node changed status - From 'Deploying' to 'Failed deployment' Fri, 07 Oct. 2016 13:20:36
Marking node failed - Machine operation 'Deploying' timed out after 40 minutes. Fri, 07 Oct. 2016 13:20:36
PXE Request - installation Fri, 07 Oct. 2016 12:39:06
PXE Request - installation Fri, 07 Oct. 2016 12:39:06
Node powered on Fri, 07 Oct. 2016 12:38:40
Powering node on

This maybe due by bug 1631403 which we saw simultaneously but opening this bug in case they are separate issues.

After upgrading to Beta2 the deployment issues continued. Then, after rebooting the container, the issue went away.

$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===============================-====================================-============-=================================================
ii maas 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all "Metal as a Service" is a physical cloud and IPAM
ii maas-cli 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all MAAS client and command-line interface
un maas-cluster-controller <none> <none> (no description available)
ii maas-common 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all MAAS server common files
ii maas-dhcp 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all MAAS DHCP server
ii maas-dns 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all MAAS DNS server
ii maas-proxy 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all MAAS Caching Proxy
ii maas-rack-controller 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all Rack Controller for MAAS
ii maas-region-api 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all Region controller API service for MAAS
ii maas-region-controller 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all Region Controller for MAAS
un maas-region-controller-min <none> <none> (no description available)
un python-django-maas <none> <none> (no description available)
un python-maas-client <none> <none> (no description available)
un python-maas-provisioningserver <none> <none> (no description available)
ii python3-django-maas 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all MAAS server Django web framework (Python 3)
ii python3-maas-client 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all MAAS python API client (Python 3)
ii python3-maas-provisioningserver 2.1.0~beta2+bzr5454-0ubuntu1~16.04.1 all MAAS server provisioning libraries (Python 3)

Tags: oil
Revision history for this message
Larry Michel (lmic) wrote :
summary: - [2.1] failure to deploy all systems that lasted until reboot
+ [2.1] failure to deploy all systems that lasted until reboot of maas
+ container
Revision history for this message
Andres Rodriguez (andreserl) wrote :

I wonder if this is related to a network issue that was solved by rebooting the container (maybe related to iscsi).

summary: - [2.1] failure to deploy all systems that lasted until reboot of maas
+ [2.1b2] failure to deploy all systems that lasted until reboot of maas
container
Changed in maas:
milestone: none → 2.1.0
Revision history for this message
Blake Rouse (blake-rouse) wrote :

Next time this happens please provide the status of all services from systemd that MAAS uses. That will help in see what service was down that caused this issue.

sudo systemctl status maas-regiond maas-rackd maas-proxy tgtd maas-dhcpd maas-dhcpd6 bind9

Changed in maas:
status: New → Incomplete
Changed in maas:
milestone: 2.1.0 → 2.1.1
Changed in maas:
milestone: 2.1.1 → 2.1.2
Changed in maas:
milestone: 2.1.2 → 2.1.3
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We have not seen this again afaik. Not sure why this one hasn't expired yet..

Changed in maas:
status: Incomplete → Invalid
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Marked as invalid since it's old/incomplete.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.