commissioning failing due to no-such-image in boot string

Bug #1459866 reported by Dan Poler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Won't Fix
Critical
Unassigned

Bug Description

MAAS Version 1.8.0 (beta8+bzr3951)
The image kinda explains everything...
Sometimes you force the VM off and back on and it boots fine. I've also had to abort commissioning and re-commission to get it to work.

ubuntu@maas-server:~$ dpkg -l '*maas*'|cat
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================================-====================================-============-===============================================================================
ii maas 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS server all-in-one metapackage
ii maas-cli 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS command line API tool
ii maas-cluster-controller 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS server cluster controller
ii maas-common 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS server common files
ii maas-dhcp 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS DHCP server
ii maas-dns 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS DNS server
ii maas-proxy 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS Caching Proxy
ii maas-region-controller 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS server complete region controller
ii maas-region-controller-min 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS Server minimum region controller
ii python-django-maas 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS server Django web framework
ii python-maas-client 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS python API client
ii python-maas-provisioningserver 1.8.0~beta8+bzr3951-0ubuntu1~trusty1 all MAAS server provisioning libraries

Tags: cpe-sa
Revision history for this message
Dan Poler (l-dan) wrote :
Revision history for this message
Dan Poler (l-dan) wrote :
Revision history for this message
Dan Poler (l-dan) wrote :

Seems to be random. Of four VM's I just commissioned:

1 had this happen, force off and restart the VM and it worked
1 had this happen, force reset it happened again, force off and restart the VM and it worked
1 had this happen, two force off and restart the VM still not working; aborted commissioning and recommissioned and it worked
1 commissioned normally with no intervention

Revision history for this message
Andres Rodriguez (andreserl) wrote :

Hi Dan,

I wonder if this is due to bridge configured is casing timeouts, which was aknown issue in orange boxes:

2015-05-28 17:02:42-0600 [-] (UDP Port 43952 Closed)
2015-05-28 17:02:42-0600 [-] Stopping protocol <tftp.bootstrap.RemoteOriginReadSession instance at 0x7f94c6a18fc8>
2015-05-28 17:02:42-0600 [TFTP (UDP)] Datagram received from ('10.27.0.53', 49154): <RRQDatagram(filename=ubuntu/amd64/generic/trusty/no-such-image/boot-kernel, mode=octet, options={'tsize': '0', 'blksize': '1408'})>
2015-05-28 17:02:42-0600 [TFTP (UDP)] Datagram received from ('10.27.0.53', 49155): <RRQDatagram(filename=ubuntu/amd64/generic/trusty/no-such-image/boot-kernel.cbt, mode=octet, options={'tsize': '0', 'blksize': '1408'})>
2015-05-28 17:02:42-0600 [TFTP (UDP)] Datagram received from ('10.27.0.53', 49156): <RRQDatagram(filename=ubuntu/amd64/generic/trusty/no-such-image/boot-kernel.0, mode=octet, options={'tsize': '0', 'blksize': '1408'})>
2015-05-28 17:02:42-0600 [TFTP (UDP)] Datagram received from ('10.27.0.53', 49157): <RRQDatagram(filename=ubuntu/amd64/generic/trusty/no-such-image/boot-kernel.com, mode=octet, options={'tsize': '0', 'blksize': '1408'})>
2015-05-28 17:02:42-0600 [TFTP (UDP)] Datagram received from ('10.27.0.53', 49158): <RRQDatagram(filename=ubuntu/amd64/generic/trusty/no-such-image/boot-kernel.c32, mode=octet, options={'tsize': '0', 'blksize': '1408'})>
2015-05-28 17:02:48-0600 [-] Timed during option negotiation process
2015-05-28 17:02:48-0600 [-] Timed during option negotiation process
2015-05-28 17:02:48-0600 [-] Timed during option negotiation process
2015-05-28 17:02:48-0600 [-] Timed during option negotiation process

Dan,

can you check that when this is happening, the cluster gets disconnected from the region (check the cluster page)

Changed in maas:
milestone: none → 1.8.0
importance: Undecided → Critical
Revision history for this message
Dan Poler (l-dan) wrote :

I'll check out cluster status if it happens again but this is not an Orange Box - all just KVM VM's on a single host.

Revision history for this message
Dan Poler (l-dan) wrote :

Just had it happen again (this time during a deploy) and the cluster shows it's connected to the region at that time (green check mark in the "Clusters" screen)

Revision history for this message
Raphaël Badin (rvb) wrote :

Looks like the cluster is disconnected and thus MAAS is unable to find an suitable image.

Revision history for this message
Dan Poler (l-dan) wrote :

Cluster/Region are on the same machine and there's nothing in the GUI to indicate the disconnection... It happens randomly, so it'll fail then work ten seconds later... I think there's something else at play here...

Revision history for this message
Dan Poler (l-dan) wrote :

This behaviour has mysteriously disappeared... Two things have changed, I've upgraded to MAAS Version 1.8.0 (rc1+bzr3972) and I'm on the office network. Unclear if either change is related. It can be closed and I'll reopen if it occurs again.

Changed in maas:
status: New → Incomplete
Revision history for this message
Anmar Salih (anmar-zubier) wrote :

Hi Andres ,
I have the same issue. Did you found the solution ?

Thankx

Revision history for this message
Данило Шеган (danilo) wrote :

@Anmar, please provide more details: what version of MAAS, what's the environment, and the log files?

Changed in maas:
status: Incomplete → Triaged
status: Triaged → Incomplete
Revision history for this message
Andres Rodriguez (andreserl) wrote :

Dear user,

This is an automated message.

We believe this bug report is no longer an issue in the latest version of MAAS. For such reason, we are making this issue as Won't Fix. If you believe this issue is still present in the latest version of MAAS, please re-open this bug report.

Changed in maas:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.